Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Offered By: Yannic Kilcher via YouTube

Course Description

Overview

Explore the world of Sparse Expert Models in this comprehensive interview with Google Brain researchers Barret Zoph and William Fedus. Delve into the fundamentals, history, strengths, and weaknesses of these innovative models, including Switch Transformers and GLAM, which can scale up to trillions of parameters. Learn how sparse expert models distribute parts of Transformers across large arrays of machines, using routing functions to efficiently activate only specific parts of the model. Discover the advantages of this approach, its applications in natural language processing, and potential future developments. Gain insights into the comparison between sparse and dense models, the improvements made by GLAM, and the possibilities of distributing experts beyond data centers. Whether you're a machine learning enthusiast or a seasoned researcher, this in-depth discussion provides valuable knowledge on the current state of the art in sparse expert models and their potential impact on the field of artificial intelligence.

Syllabus

- Intro
- What are sparse expert models?
- Start of Interview
- What do you mean by sparse experts?
- How does routing work in these models?
- What is the history of sparse experts?
- What does an individual expert learn?
- When are these models appropriate?
- How comparable are sparse to dense models?
- How does the pathways system connect to this?
- What improvements did GLAM make?
- The "designing sparse experts" paper
- Can experts be frozen during training?
- Can the routing function be improved?
- Can experts be distributed beyond data centers?
- Are there sparse experts for other domains than NLP?
- Are sparse and dense models in competition?
- Where do we go from here?
- How can people get started with this?

Taught by

Yannic Kilcher

Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue