YoVDO

Mixtral of Experts - Paper Explained

Offered By: Yannic Kilcher via YouTube

Tags

Artificial Intelligence Courses Deep Learning Courses Language Models Courses Transformer Architecture Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an in-depth analysis of the Mixtral of Experts paper in this comprehensive video lecture. Delve into the intricacies of Sparse Mixture of Experts (SMoE) language models, comparing Mixtral 8x7B's architecture to Mistral 7B and examining its performance against Llama 2 70B and GPT-3.5. Learn about expert routing, sparse expert routing, and expert parallelism. Discover the experimental results, routing analysis, and conclusions drawn from this groundbreaking research in natural language processing and artificial intelligence.

Syllabus

- Introduction
- Mixture of Experts
- Classic Transformer Blocks
- Expert Routing
- Sparse Expert Routing
- Expert Parallelism
- Experimental Results
- Routing Analysis
- Conclusion


Taught by

Yannic Kilcher

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Artificial Intelligence for Robotics
Stanford University via Udacity
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent