YoVDO

Fastformer - Additive Attention Can Be All You Need

Offered By: Yannic Kilcher via YouTube

Tags

Machine Learning Courses Data Science Courses Transformer Models Courses

Course Description

Overview

Explore a detailed analysis of the Fastformer, a proposed efficient Transformer model for text understanding, in this 36-minute video. Dive into the architecture's key components, including additive attention and element-wise multiplication, and understand how it aims to achieve linear complexity for processing long sequences. Compare Fastformer to classic attention mechanisms, examine potential issues with the architecture, and evaluate its effectiveness through experimental results. Gain insights into the ongoing research efforts to improve Transformer models for handling long contexts efficiently.

Syllabus

- Intro & Outline
- Fastformer description
- Baseline: Classic Attention
- Fastformer architecture
- Additive Attention
- Query-Key element-wise multiplication
- Redundant modules in Fastformer
- Problems with the architecture
- Is this even attention?
- Experimental Results
- Conclusion & Comments


Taught by

Yannic Kilcher

Related Courses

Sequence Models
DeepLearning.AI via Coursera
Modern Natural Language Processing in Python
Udemy
Stanford Seminar - Transformers in Language: The Development of GPT Models Including GPT-3
Stanford University via YouTube
Long Form Question Answering in Haystack
James Briggs via YouTube
Spotify's Podcast Search Explained
James Briggs via YouTube