YoVDO

Longformer - The Long-Document Transformer

Offered By: Yannic Kilcher via YouTube

Tags

Transformer Models Courses Deep Learning Courses

Course Description

Overview

Explore a comprehensive video analysis of the Longformer, an innovative extension of the Transformer model designed to process long documents. Delve into the key concepts of sliding window attention and sparse global attention, which enable the handling of sequences with thousands of tokens. Examine how this architecture overcomes the quadratic scaling limitation of traditional self-attention mechanisms. Learn about the model's performance in character-level language modeling tasks and its state-of-the-art results on text8 and enwik8 datasets. Discover the Longformer's effectiveness when pretrained and fine-tuned on various downstream tasks, consistently outperforming RoBERTa on long document tasks. Gain insights into the model's architecture, including the introduction of local windowed attention combined with task-motivated global attention. Understand the significance of this advancement in natural language processing and its potential applications in handling extensive documents.

Syllabus

Introduction
Problem
Transformer Model
Keys Queries
Convolutional Network
Dilated Window
Global Attention


Taught by

Yannic Kilcher

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera
Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera
Leading Ambitious Teaching and Learning
Microsoft via edX