Longformer - The Long-Document Transformer
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Explore a comprehensive video analysis of the Longformer, an innovative extension of the Transformer model designed to process long documents. Delve into the key concepts of sliding window attention and sparse global attention, which enable the handling of sequences with thousands of tokens. Examine how this architecture overcomes the quadratic scaling limitation of traditional self-attention mechanisms. Learn about the model's performance in character-level language modeling tasks and its state-of-the-art results on text8 and enwik8 datasets. Discover the Longformer's effectiveness when pretrained and fine-tuned on various downstream tasks, consistently outperforming RoBERTa on long document tasks. Gain insights into the model's architecture, including the introduction of local windowed attention combined with task-motivated global attention. Understand the significance of this advancement in natural language processing and its potential applications in handling extensive documents.
Syllabus
Introduction
Problem
Transformer Model
Keys Queries
Convolutional Network
Dilated Window
Global Attention
Taught by
Yannic Kilcher
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera 機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera Leading Ambitious Teaching and Learning
Microsoft via edX