Long-context Attention in Near-Linear Time
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore a groundbreaking lecture on "Long-context Attention in Near-Linear Time" presented by David Woodruff from Carnegie Mellon University at the Simons Institute. Delve into the innovative "HyperAttention" mechanism, designed to tackle computational challenges in Large Language Models (LLMs) with extended contexts. Discover how this approach achieves linear time sampling, even with unbounded matrix entries or high stable rank, by introducing two key parameters. Learn about the modular design of HyperAttention and its compatibility with other fast implementations like FlashAttention. Examine the empirical performance of this technique across various long-context datasets, including its impressive impact on ChatGLM2's inference time and its significant speedup for larger context lengths. Gain insights into the collaborative research behind this advancement in the field of sublinear algorithms.
Syllabus
Long-context Attention in Near-Linear Time
Taught by
Simons Institute
Related Courses
Mining Massive DatasetsStanford University via edX Building Features from Text Data
Pluralsight Locality Sensitive Hashing for Search with Shingling + MinHashing - Python
James Briggs via YouTube Private Nearest Neighbor Search with Sublinear Communication and Malicious Security
IEEE via YouTube Time Signature Based Matching for Data Fusion and Coordination Detection in Cyber Relevant Logs
0xdade via YouTube