YoVDO

Themis - Fair and Efficient GPU Cluster Scheduling

Offered By: USENIX via YouTube

Tags

USENIX Symposium on Networked Systems Design and Implementation (NSDI) Courses Machine Learning Courses Deep Learning Courses

Course Description

Overview

Explore a 22-minute conference talk from USENIX NSDI '20 that introduces Themis, a novel scheduling framework for GPU cluster management in distributed machine learning workloads. Dive into the challenges of fair and efficient GPU allocation across multiple ML jobs, and discover how Themis addresses these issues through a unique two-level scheduling architecture. Learn about the concept of finish-time fairness and how it's implemented using an auction-based resource allocation system. Examine the framework's performance compared to existing schedulers, with insights on improved fairness and cluster efficiency. Gain valuable knowledge on GPU cluster scheduling, resource allocation strategies, and the specific needs of ML training workloads in shared environments.

Syllabus

Intro
Deep Learning at a Large Enterprise
GPU Cluster Scheduler: Goal
Existing GPU Cluster Schedulers
GPU Cluster Scheduler: Drawback 2
GPU Cluster Scheduler: Requirement 2
Towards a new GPU Cluster Scheduler
Themis: Metric
Themis: Finish-Time Fairness Metric
Themis: Interface
Strawman Mechanism: Issues
Themis: Mechanism: Partial Allocation Auction
Themis: Overall Design
Themis: Implementation
Themis: Evaluation
Macrobenchmark: Sharing Incentive
Macrobenchmark: Efficiency
Conclusion


Taught by

USENIX

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera
Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera
Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera
Leading Ambitious Teaching and Learning
Microsoft via edX