Accelerating Distributed MoE Training and Inference with Lina
Offered By: USENIX via YouTube
Course Description
Overview
Explore a conference talk that delves into accelerating distributed Mixture of Experts (MoE) training and inference using Lina. Learn about the challenges of scaling model parameters and the potential of sparsely activated models to train larger models at lower costs. Discover the systematic analysis of all-to-all communication overhead in distributed MoE and understand the main causes of bottlenecks in training and inference. Examine Lina's innovative approach to addressing these bottlenecks through tensor partitioning and dynamic resource scheduling. Gain insights into how Lina improves training step time and reduces inference time compared to state-of-the-art systems, as demonstrated through experiments on an A100 GPU testbed.
Syllabus
USENIX ATC '23 - Accelerating Distributed MoE Training and Inference with Lina
Taught by
USENIX
Related Courses
Scalable Data ScienceIndian Institute of Technology, Kharagpur via Swayam Data Science and Engineering with Spark
Berkeley University of California via edX Data Science on Google Cloud: Machine Learning
Google via Qwiklabs Modern Distributed Systems
Delft University of Technology via edX KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube