HetPipe - Enabling Large DNN Training on Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism
Offered By: USENIX via YouTube
Course Description
Overview
Explore a conference talk that introduces HetPipe, a novel system for training large Deep Neural Network (DNN) models on heterogeneous GPU clusters. Learn how HetPipe integrates pipelined model parallelism with data parallelism to enable efficient training on diverse GPU architectures, including less powerful ones. Discover the Wave Synchronous Parallel (WSP) parameter synchronization model and its convergence proof. Examine experimental results demonstrating up to 49% faster convergence compared to state-of-the-art data parallelism techniques. Gain insights into the challenges of training large DNNs and innovative solutions for leveraging heterogeneous GPU resources effectively.
Syllabus
Introduction
Motivation Background
Single Virtual Occur
Evaluation
Partitioning
Equal Distribution
Hybrid Policy
Parameter Placement Policy
Local Placement Policy
Convergence
Conclusion
Taught by
USENIX
Related Courses
Intro to Parallel ProgrammingNvidia via Udacity Introduction to Linear Models and Matrix Algebra
Harvard University via edX Введение в параллельное программирование с использованием OpenMP и MPI
Tomsk State University via Coursera Supercomputing
Partnership for Advanced Computing in Europe via FutureLearn Fundamentals of Parallelism on Intel Architecture
Intel via Coursera