YoVDO

Accelerating Distributed MoE Training and Inference with Lina

Offered By: USENIX via YouTube

Tags

USENIX Annual Technical Conference Courses Model Optimization Courses Distributed Machine Learning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a conference talk that delves into accelerating distributed Mixture of Experts (MoE) training and inference using Lina. Learn about the challenges of scaling model parameters and the potential of sparsely activated models to train larger models at lower costs. Discover the systematic analysis of all-to-all communication overhead in distributed MoE and understand the main causes of bottlenecks in training and inference. Examine Lina's innovative approach to addressing these bottlenecks through tensor partitioning and dynamic resource scheduling. Gain insights into how Lina improves training step time and reduces inference time compared to state-of-the-art systems, as demonstrated through experiments on an A100 GPU testbed.

Syllabus

USENIX ATC '23 - Accelerating Distributed MoE Training and Inference with Lina


Taught by

USENIX

Related Courses

Amazon DynamoDB - A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service
USENIX via YouTube
Faasm - Lightweight Isolation for Efficient Stateful Serverless Computing
USENIX via YouTube
AC-Key - Adaptive Caching for LSM-based Key-Value Stores
USENIX via YouTube
The Future of the Past - Challenges in Archival Storage
USENIX via YouTube
A Decentralized Blockchain with High Throughput and Fast Confirmation
USENIX via YouTube