YoVDO

Metis - Fast Automatic Distributed Training on Heterogeneous GPUs

Offered By: USENIX via YouTube

Tags

Distributed Training Courses Deep Learning Courses GPU Computing Courses Heterogeneous Computing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking conference talk from USENIX ATC '24 that introduces Metis, an innovative system for automatic distributed training on heterogeneous GPUs. Delve into the challenges of expanding deep learning model sizes and the need to utilize diverse GPU types efficiently. Learn how Metis optimizes key system components to leverage the compute powers and memory capacities of various GPU types, enabling fine-grained distribution of training workloads. Discover the novel search algorithm developed to efficiently prune large search spaces and balance loads with heterogeneity-awareness. Examine the evaluation results showcasing Metis' superior performance in finding optimal parallelism plans for large models like GPT-3, MoE, and Wide-Resnet across multiple GPU types. Gain insights into how Metis achieves significant training speed-ups while reducing profiling and search overheads compared to traditional methods and oracle planning.

Syllabus

USENIX ATC '24 - Metis: Fast Automatic Distributed Training on Heterogeneous GPUs


Taught by

USENIX

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera
Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera
Leading Ambitious Teaching and Learning
Microsoft via edX