Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Offered By: USENIX via YouTube

Course Description

Overview

Explore a cutting-edge approach to deep learning cluster scheduling in this 14-minute conference talk from OSDI '21. Dive into Pollux, a co-adaptive cluster scheduler that optimizes goodput in deep learning environments. Learn how this innovative system simultaneously considers per-job and cluster-wide factors to improve resource allocation and utilization. Discover the novel goodput metric that combines system throughput with statistical efficiency, and understand how Pollux dynamically reassigns resources to enhance overall cluster performance. Gain insights into the system's ability to reduce average job completion times, promote fairness, and potentially lower costs in cloud environments. Examine the background of distributed deep learning, the impact of batch size on system throughput and statistical efficiency, and the key components of Pollux's cluster scheduler. Delve into the evaluation results and broader implications of this groundbreaking approach to deep learning cluster management.

Syllabus

Intro
Deep Learning Training in Shared Clusters
Example Shared-Cluster DL Training Workflow
Pollux: Co-adaptive Cluster Scheduler for DL
Outline
Background: Distributed DL (Data Parallelism)
System Throughput and Impact of Batch Size
Statistical Efficiency and Impact of Batch Size
illustration of Overall Training Performance
Implications for Cluster Scheduling
Pollux Cluster Scheduler
Key Idea: Goodput, not Throughput
Modeling System Throughput
Modeling Statistical Efficiency
Optimizing Cluster-Wide Allocations
Evaluation of Pollux
Cluster-Wide Statistical Efficiency
More Experiments in our Paper!
Conclusion

Taught by

USENIX

Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue