Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Offered By: USENIX via YouTube
Course Description
Overview
Syllabus
Intro
Hardware for ML training is becoming highly specialized and heterogeneous!
How should we allocate heterogeneous resources?
Challenge 1: Heterogeneous performance
Challenge 2: Diverse scheduling objectives
Related work
Gavel: A new heterogeneity-aware cluster scheduler
Scheduling policies to be made heterogeneity-aware
Policies as optimization problems
Allocations (x) as time fractions
Effective throughput
Performance optimizations: space sharing and placement
How do we realize an optimal allocation?
Gavel's round-based scheduling
Main questions
Gavel improves objectives on a heterogeneous cluster
Gavel can enable the same heterogeneous cluster to support higher input load
Gavel can support hierarchical policies
Gavel scales to clusters with hundreds of active jobs
Conclusion
Taught by
USENIX
Related Courses
GraphX - Graph Processing in a Distributed Dataflow FrameworkUSENIX via YouTube Theseus - An Experiment in Operating System Structure and State Management
USENIX via YouTube RedLeaf - Isolation and Communication in a Safe Operating System
USENIX via YouTube Microsecond Consensus for Microsecond Applications
USENIX via YouTube KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube