Big Bird- Transformers for Longer Sequences

Offered By: Yannic Kilcher via YouTube

Course Description

Overview

Explore a comprehensive video explanation of the BigBird paper, which introduces a novel sparse attention mechanism for transformers to handle longer sequences. Learn about the challenges of quadratic memory requirements in full attention models and how BigBird addresses this issue through a combination of random, window, and global attention. Discover the theoretical foundations, including universal approximation and Turing completeness, as well as the practical implications for NLP tasks such as question answering and summarization. Gain insights into the experimental parameters, structured block computations, and results that demonstrate BigBird's improved performance on various NLP tasks and its potential applications in genomics.

Syllabus

- Intro & Overview
- Quadratic Memory in Full Attention
- Architecture Overview
- Random Attention
- Window Attention
- Global Attention
- Architecture Summary
- Theoretical Result
- Experimental Parameters
- Structured Block Computations
- Recap
- Experimental Results
- Conclusion

Taught by

Yannic Kilcher

Big Bird- Transformers for Longer Sequences

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Big Bird- Transformers for Longer Sequences

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue