YoVDO

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space

Offered By: Simons Institute via YouTube

Tags

Transformers Courses Neural Networks Courses Scaling Laws Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the cutting-edge research on efficient alternatives to dense linear layers in large neural networks through this 42-minute lecture by Andrew Gordon Wilson from New York University. Delve into a unifying framework that enables searching among all linear operators expressible via Einstein summation, encompassing previously proposed structures and introducing novel ones. Examine the developed taxonomy of operators based on computational and algebraic properties, gaining insights into their scaling laws. Discover the subset of structures that outperform dense layers in terms of training compute efficiency. Learn about the natural extension of these structures into sparse mixture-of-experts layers, significantly improving compute-optimal training efficiency for large language models. Gain valuable knowledge on the future of Transformers and efficient linear layers in the field of machine learning and artificial intelligence.

Syllabus

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...


Taught by

Simons Institute

Related Courses

Introduction To Mechanical Micro Machining
Indian Institute of Technology, Kharagpur via Swayam
Biomaterials - Intro to Biomedical Engineering
Udemy
OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision
Aleksa Gordić - The AI Epiphany via YouTube
Turbulence as Gibbs Statistics of Vortex Sheets - Alexander Migdal
Institute for Advanced Study via YouTube
City Analytics - Professor Peter Grindrod CBE
Alan Turing Institute via YouTube