Longformer - The Long-Document Transformer

Offered By: Yannic Kilcher via YouTube

Course Description

Overview

Explore a comprehensive video analysis of the Longformer, an innovative extension of the Transformer model designed to process long documents. Delve into the key concepts of sliding window attention and sparse global attention, which enable the handling of sequences with thousands of tokens. Examine how this architecture overcomes the quadratic scaling limitation of traditional self-attention mechanisms. Learn about the model's performance in character-level language modeling tasks and its state-of-the-art results on text8 and enwik8 datasets. Discover the Longformer's effectiveness when pretrained and fine-tuned on various downstream tasks, consistently outperforming RoBERTa on long document tasks. Gain insights into the model's architecture, including the introduction of local windowed attention combined with task-motivated global attention. Understand the significance of this advancement in natural language processing and its potential applications in handling extensive documents.

Syllabus

Introduction
Problem
Transformer Model
Keys Queries
Convolutional Network
Dilated Window
Global Attention

Taught by

Yannic Kilcher

Longformer - The Long-Document Transformer

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Longformer - The Long-Document Transformer

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue