Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors
Offered By: Yannic Kilcher via YouTube
Course Description
Overview
Syllabus
- Intro
- What are sparse expert models?
- Start of Interview
- What do you mean by sparse experts?
- How does routing work in these models?
- What is the history of sparse experts?
- What does an individual expert learn?
- When are these models appropriate?
- How comparable are sparse to dense models?
- How does the pathways system connect to this?
- What improvements did GLAM make?
- The "designing sparse experts" paper
- Can experts be frozen during training?
- Can the routing function be improved?
- Can experts be distributed beyond data centers?
- Are there sparse experts for other domains than NLP?
- Are sparse and dense models in competition?
- Where do we go from here?
- How can people get started with this?
Taught by
Yannic Kilcher
Related Courses
Sequence ModelsDeepLearning.AI via Coursera Modern Natural Language Processing in Python
Udemy Stanford Seminar - Transformers in Language: The Development of GPT Models Including GPT-3
Stanford University via YouTube Long Form Question Answering in Haystack
James Briggs via YouTube Spotify's Podcast Search Explained
James Briggs via YouTube