Transformers, Parallel Computation, and Logarithmic Depth
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore the computational power of transformers in this 57-minute lecture by Daniel Hsu from Columbia University. Delve into the relationship between self-attention layers and communication rounds in Massively Parallel Computation. Discover how logarithmic depth enables transformers to efficiently solve complex computational tasks that challenge other neural sequence models and sub-quadratic transformer approximations. Gain insights into parallelism as a crucial distinguishing feature of transformers. Learn about the collaborative research with Clayton Sanford from Google and Matus Telgarsky from NYU, focusing on the simulation capabilities between constant numbers of self-attention layers and communication rounds in Massively Parallel Computation.
Syllabus
Transformers, parallel computation, and logarithmic depth
Taught by
Simons Institute
Related Courses
Automata TheoryStanford University via edX Introduction to Computational Thinking and Data Science
Massachusetts Institute of Technology via edX 算法设计与分析 Design and Analysis of Algorithms
Peking University via Coursera How to Win Coding Competitions: Secrets of Champions
ITMO University via edX Introdução à Ciência da Computação com Python Parte 2
Universidade de São Paulo via Coursera