Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer

Offered By: Stanford University via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore the groundbreaking Mixture of Experts (MoE) paradigm and the Switch Transformer in this Stanford seminar. Delve into how MoE challenges traditional deep learning models by selecting different parameters for each input, resulting in sparsely-activated models with vast numbers of parameters but constant computational cost. Learn about the simplification of MoE routing algorithms, improved model designs with reduced communication and computational costs, and innovative training techniques that address instabilities. Discover how large sparse models can be trained using lower precision formats, leading to significant increases in pre-training speed. Examine the application of these improvements in multilingual settings and the advancement of language model scale to trillion-parameter models. Gain insights from research scientists Barret Zoph and Irwan Bello as they discuss their work on various deep learning topics, including neural architecture search, data augmentation, semi-supervised learning, and model sparsity.

Syllabus

CS25 I Stanford Seminar 2022 - Mixture of Experts (MoE) paradigm and the Switch Transformer

Taught by

Stanford Online

Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Login to Continue