Transformers, Parallel Computation, and Logarithmic Depth
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore the computational power of transformers in this 57-minute lecture by Daniel Hsu from Columbia University. Delve into the relationship between self-attention layers and communication rounds in Massively Parallel Computation. Discover how logarithmic depth enables transformers to efficiently solve complex computational tasks that challenge other neural sequence models and sub-quadratic transformer approximations. Gain insights into parallelism as a crucial distinguishing feature of transformers. Learn about the collaborative research with Clayton Sanford from Google and Matus Telgarsky from NYU, focusing on the simulation capabilities between constant numbers of self-attention layers and communication rounds in Massively Parallel Computation.
Syllabus
Transformers, parallel computation, and logarithmic depth
Taught by
Simons Institute
Related Courses
Transformers: Text Classification for NLP Using BERTLinkedIn Learning TensorFlow: Working with NLP
LinkedIn Learning TransGAN - Two Transformers Can Make One Strong GAN - Machine Learning Research Paper Explained
Yannic Kilcher via YouTube Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention
Yannic Kilcher via YouTube Recreate Google Translate - Model Training
Edan Meyer via YouTube