YoVDO

Big Bird- Transformers for Longer Sequences

Offered By: Yannic Kilcher via YouTube

Tags

Transformers Courses Artificial Intelligence Courses

Course Description

Overview

Explore a comprehensive video explanation of the BigBird paper, which introduces a novel sparse attention mechanism for transformers to handle longer sequences. Learn about the challenges of quadratic memory requirements in full attention models and how BigBird addresses this issue through a combination of random, window, and global attention. Discover the theoretical foundations, including universal approximation and Turing completeness, as well as the practical implications for NLP tasks such as question answering and summarization. Gain insights into the experimental parameters, structured block computations, and results that demonstrate BigBird's improved performance on various NLP tasks and its potential applications in genomics.

Syllabus

- Intro & Overview
- Quadratic Memory in Full Attention
- Architecture Overview
- Random Attention
- Window Attention
- Global Attention
- Architecture Summary
- Theoretical Result
- Experimental Parameters
- Structured Block Computations
- Recap
- Experimental Results
- Conclusion


Taught by

Yannic Kilcher

Related Courses

Linear Circuits
Georgia Institute of Technology via Coursera
مقدمة في هندسة الطاقة والقوى
King Abdulaziz University via Rwaq (رواق)
Magnetic Materials and Devices
Massachusetts Institute of Technology via edX
Linear Circuits 2: AC Analysis
Georgia Institute of Technology via Coursera
Transmisión de energía eléctrica
Tecnológico de Monterrey via edX