YoVDO

HetPipe - Enabling Large DNN Training on Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Offered By: USENIX via YouTube

Tags

USENIX Annual Technical Conference Courses Artificial Intelligence Courses Machine Learning Courses Parallel Computing Courses

Course Description

Overview

Explore a conference talk that introduces HetPipe, a novel system for training large Deep Neural Network (DNN) models on heterogeneous GPU clusters. Learn how HetPipe integrates pipelined model parallelism with data parallelism to enable efficient training on diverse GPU architectures, including less powerful ones. Discover the Wave Synchronous Parallel (WSP) parameter synchronization model and its convergence proof. Examine experimental results demonstrating up to 49% faster convergence compared to state-of-the-art data parallelism techniques. Gain insights into the challenges of training large DNNs and innovative solutions for leveraging heterogeneous GPU resources effectively.

Syllabus

Introduction
Motivation Background
Single Virtual Occur
Evaluation
Partitioning
Equal Distribution
Hybrid Policy
Parameter Placement Policy
Local Placement Policy
Convergence
Conclusion


Taught by

USENIX

Related Courses

Amazon DynamoDB - A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service
USENIX via YouTube
Faasm - Lightweight Isolation for Efficient Stateful Serverless Computing
USENIX via YouTube
AC-Key - Adaptive Caching for LSM-based Key-Value Stores
USENIX via YouTube
The Future of the Past - Challenges in Archival Storage
USENIX via YouTube
A Decentralized Blockchain with High Throughput and Fast Confirmation
USENIX via YouTube