YoVDO

Mixture-of-Experts Courses

GShard- Scaling Giant Models with Conditional Computation and Automatic Sharding
Yannic Kilcher via YouTube
Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments
Association for Computing Machinery (ACM) via YouTube
Modules and Architectures
Alfredo Canziani via YouTube
Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer
Stanford University via YouTube
Decoding Mistral AI's Large Language Models - Building Blocks and Training Strategies
Databricks via YouTube
Pioneering a Hybrid SSM Transformer Architecture - Jamba Foundation Model
Databricks via YouTube
LLaMA 3 Deep Dive - Synthetic Data, Privacy, and Model Architecture
Aleksa Gordić - The AI Epiphany via YouTube
Microsoft's Phi 3.5 - Latest Small Language Models Overview
Sam Witteveen via YouTube
Developing and Serving RAG-Based LLM Applications in Production
Anyscale via YouTube
Mixtral Fine-Tuning and Inference - Advanced Guide
Trelis Research via YouTube
Page 1 Next >