YoVDO

Flash Attention 2.0 with Tri Dao - Discord Server Talks

Offered By: Aleksa Gordić - The AI Epiphany via YouTube

Tags

Attention Mechanisms Courses Machine Learning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive Discord server talk featuring Tri Dao from Stanford, discussing his groundbreaking work on Flash Attention 2.0. Explore the motivation behind modeling long sequences, gain insights into attention mechanisms, and understand the memory bottleneck and IO awareness challenges. Learn about the improvements in Flash Attention 2.0, including the refactoring of CUTLASS 3, and discover future directions in this field. Engage with an informative Q&A session to deepen your understanding of this cutting-edge technology in machine learning systems.

Syllabus

Main talk starts - intro & motivation
Behind the scenes: how Tri got started with Flash Attention
Motivation: modelling long sequences
Brief recap of attention
Memory bottleneck, IO awareness
Flash Attention 2.0 improvements
Behind the scenes of Flash Attention 2.0 refactor of CUTLASS 3
Future directions
Q&A


Taught by

Aleksa Gordić - The AI Epiphany

Related Courses

Deep Learning for Natural Language Processing
University of Oxford via Independent
Sequence Models
DeepLearning.AI via Coursera
Deep Learning Part 1 (IITM)
Indian Institute of Technology Madras via Swayam
Deep Learning - Part 1
Indian Institute of Technology, Ropar via Swayam
Deep Learning - IIT Ropar
Indian Institute of Technology, Ropar via Swayam