YoVDO

Train Short, Test Long - Attention With Linear Biases Enables Input Length Extrapolation

Offered By: Yannic Kilcher via YouTube

Tags

Attention Mechanisms Courses Deep Learning Courses Transformer Models Courses

Course Description

Overview

Explore the innovative ALiBi (Attention with Linear Biases) method for improving sequence extrapolation in transformer models. Dive into the limitations of traditional position encodings and discover how ALiBi's simple yet effective approach allows for efficient extrapolation to longer sequences than seen during training. Learn about the implementation details, including how to choose the slope parameter, and examine experimental results demonstrating ALiBi's performance advantages. Gain insights into why this method leads to better outcomes and understand its potential impact on natural language processing tasks.

Syllabus

- Intro & Overview
- Position Encodings in Transformers
- Sinusoidial Position Encodings
- ALiBi Position Encodings
- How to choose the slope parameter
- Experimental Results
- Comments & Conclusion


Taught by

Yannic Kilcher

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera
Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera
Leading Ambitious Teaching and Learning
Microsoft via edX