Accelerating Transformers via Kernel Density Estimation - Google TechTalk
Offered By: Google TechTalks via YouTube
Course Description
Overview
Explore efficient Transformer acceleration techniques in this Google TechTalk presented by Insu Han. Dive into the challenges of processing long sequences with dot-product attention mechanisms and discover innovative solutions using kernel density estimation (KDE). Learn about the KDEformer approach, which approximates attention in sub-quadratic time with provable spectral norm bounds. Examine experimental results comparing KDEformer's performance to other attention approximations in terms of accuracy, memory usage, and runtime on various pre-trained models. Gain insights into the potential applications and future directions of this research in accelerating large language models and sequence modeling tasks.
Syllabus
Intro
Outline for Efficient Transformer
Introduction
Transformer for Sequential Modeling
Transformer with Long Sequence
Contributions
High-level Approach
Weighted Exponential KDE
Adaptive KDE Algorithm
Algorithm Summary
Experiments
Conclusion
Future Work
Taught by
Google TechTalks
Related Courses
Applied Deep Learning: Build a Chatbot - Theory, ApplicationUdemy Can Wikipedia Help Offline Reinforcement Learning? - Paper Explained
Yannic Kilcher via YouTube Infinite Memory Transformer - Research Paper Explained
Yannic Kilcher via YouTube Recurrent Neural Networks and Transformers
Alexander Amini via YouTube MIT 6.S191 - Recurrent Neural Networks
Alexander Amini via YouTube