Train Short, Test Long - Attention With Linear Biases Enables Input Length Extrapolation
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore the innovative ALiBi (Attention with Linear Biases) method for improving sequence extrapolation in transformer models. Dive into the limitations of traditional position encodings and discover how ALiBi's simple yet effective approach allows for efficient extrapolation to longer sequences than seen during training. Learn about the implementation details, including how to choose the slope parameter, and examine experimental results demonstrating ALiBi's performance advantages. Gain insights into why this method leads to better outcomes and understand its potential impact on natural language processing tasks.
Syllabus
- Intro & Overview
- Position Encodings in Transformers
- Sinusoidial Position Encodings
- ALiBi Position Encodings
- How to choose the slope parameter
- Experimental Results
- Comments & Conclusion
Taught by
Yannic Kilcher
Related Courses
Sequence ModelsDeepLearning.AI via Coursera Modern Natural Language Processing in Python
Udemy Stanford Seminar - Transformers in Language: The Development of GPT Models Including GPT-3
Stanford University via YouTube Long Form Question Answering in Haystack
James Briggs via YouTube Spotify's Podcast Search Explained
James Briggs via YouTube