YoVDO

Sparse Is Enough in Scaling Transformers - ML Research Paper Explained

Offered By: Yannic Kilcher via YouTube

Tags

Machine Learning Courses Inference Courses

Course Description

Overview

Explore an in-depth analysis of the research paper "Sparse is Enough in Scaling Transformers" in this comprehensive video lecture. Delve into the innovative Terraformer architecture, which leverages sparsity in Transformer blocks to significantly enhance inference speed while maintaining accuracy and reducing memory consumption. Learn about sparse variants for all Transformer layers, including the sparse feedforward and QKV layers. Discover how Scaling Transformers efficiently scale and perform unbatched decoding faster than standard Transformers. Examine experimental results and conclusions, gaining insights into the potential of sparse layers in achieving competitive performance on long text summarization tasks. Enhance your understanding of cutting-edge developments in Transformer models and their applications in natural language processing.

Syllabus

- Intro & Overview
- Recap: Transformer stack
- Sparse Feedforward layer
- Sparse QKV Layer
- Terraformer architecture
- Experimental Results & Conclusion


Taught by

Yannic Kilcher

Related Courses

Discrete Inference and Learning in Artificial Vision
École Centrale Paris via Coursera
Teaching Literacy Through Film
The British Film Institute via FutureLearn
Linear Regression and Modeling
Duke University via Coursera
Probability and Statistics
Stanford University via Stanford OpenEdx
Statistical Reasoning
Stanford University via Stanford OpenEdx