YoVDO

Generalized Pipeline Parallelism for DNN Training - PipeDream System Overview

Offered By: Databricks via YouTube

Tags

Deep Neural Networks Courses Distributed Training Courses

Course Description

Overview

Explore the concept of Generalized Pipeline Parallelism for DNN training in this 21-minute conference talk from Databricks. Learn about PipeDream, a system that combines inter-batch pipelining with intra-batch parallelism to improve parallel training throughput for deep neural networks. Discover how PipeDream addresses challenges such as state version mismatches and pipeline flushes through techniques like weight versioning and efficient scheduling. Understand the automatic partitioning of DNN layers among workers to balance workload and minimize communication. Gain insights into how PipeDream outperforms traditional intra-batch parallelism techniques, achieving up to 5.3X faster training times while maintaining high accuracy. Delve into topics such as model parallelism, weight stashing, operator assignment to pipeline stages, and double-buffered weight updates. This talk is essential for those interested in optimizing DNN training processes and overcoming memory constraints in large-scale machine learning models.

Syllabus

Intro
Model Parallelism: An alternative to data parallelism
Pipelining in DNN training != Traditional pipelining
Challenge 1: Pipelining leads to weight version mismatches
Weight stashing: A solution to version mismatches
Challenge 2: How do we assign operators to pipeline stages?
Pipe Dream vs. Data Parallelism on Time-to-Accuracy
but modern Deep Neural Networks are becoming extremely large!
Double-buffered weight updates: weight semantics
2BW has weight update semantics similar to data parallelism


Taught by

Databricks

Related Courses

Custom and Distributed Training with TensorFlow
DeepLearning.AI via Coursera
Architecting Production-ready ML Models Using Google Cloud ML Engine
Pluralsight
Building End-to-end Machine Learning Workflows with Kubeflow
Pluralsight
Deploying PyTorch Models in Production: PyTorch Playbook
Pluralsight
Inside TensorFlow
TensorFlow via YouTube