YoVDO

Transformers, Parallel Computation, and Logarithmic Depth

Offered By: Simons Institute via YouTube

Tags

Transformers Courses Artificial Intelligence Courses Machine Learning Courses Computational Complexity Courses Self-Attention Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the computational power of transformers in this 57-minute lecture by Daniel Hsu from Columbia University. Delve into the relationship between self-attention layers and communication rounds in Massively Parallel Computation. Discover how logarithmic depth enables transformers to efficiently solve complex computational tasks that challenge other neural sequence models and sub-quadratic transformer approximations. Gain insights into parallelism as a crucial distinguishing feature of transformers. Learn about the collaborative research with Clayton Sanford from Google and Matus Telgarsky from NYU, focusing on the simulation capabilities between constant numbers of self-attention layers and communication rounds in Massively Parallel Computation.

Syllabus

Transformers, parallel computation, and logarithmic depth


Taught by

Simons Institute

Related Courses

Transformers: Text Classification for NLP Using BERT
LinkedIn Learning
TensorFlow: Working with NLP
LinkedIn Learning
TransGAN - Two Transformers Can Make One Strong GAN - Machine Learning Research Paper Explained
Yannic Kilcher via YouTube
Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention
Yannic Kilcher via YouTube
Recreate Google Translate - Model Training
Edan Meyer via YouTube