Flash Attention Explained - Algorithm, Applications, and Performance
Offered By: Unify via YouTube
Course Description
Overview
          Explore the Flash Attention algorithm with guest speaker Dan Fu, Stanford University researcher and co-author of the groundbreaking paper. Delve into this novel attention mechanism that significantly reduces the computational cost of self-attention in transformer-based models for natural language processing. Learn about the motivation behind Flash Attention, its downstream applications in histopathology, and its impact on memory footprint reduction. Examine empirical validations, benchmarks, and other applications such as long document classification and the Path X benchmark. Gain insights into hardware-efficient long convolutions, state space representation, and the interplay between hardware and algorithms in this comprehensive 57-minute video from Unify.
        
Syllabus
Introduction
Flash Attention
Motivation for Flash Attention
Downstream Applications
Histopathology
Outline
Attention
Memory Footprint
GPU Memory
Memory Footprint Reduction
Approximate Attention
FlashAttention
Sparsity Fraction
Empirical Validation
Benchmarks
Other Applications
Long Document Classification
Path X Benchmark
Hungry Hungry Hippos
Simple Hardware Efficient Long Convolutions
Summary
Question
State Space Representation
Loop Order
Speed vs Sequence Length
Hardware vs Algorithms
Hardware Software Codesign
Tensor Cores
Taught by
Unify
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent
