YoVDO

Beware of Fragmentation - Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent

Offered By: USENIX via YouTube

Tags

USENIX Annual Technical Conference Courses Kubernetes Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 22-minute conference talk from USENIX ATC '23 that addresses the critical issue of GPU underutilization in large tech companies. Dive into the challenges of GPU sharing techniques and the resulting fragmentation problems in large clusters. Learn about a novel approach called Fragmentation Gradient Descent (FGD), which quantifies GPU fragmentation and schedules workloads to minimize its growth. Discover how this innovative method, implemented as a new scheduler in Kubernetes, significantly reduces unallocated GPUs and improves overall utilization. Gain insights into the performance evaluation of FGD using production traces on an emulated cluster of over 6,200 GPUs, and understand its potential to revolutionize GPU resource management in large-scale machine learning environments.

Syllabus

USENIX ATC '23 - Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation...


Taught by

USENIX

Related Courses

Amazon DynamoDB - A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service
USENIX via YouTube
Faasm - Lightweight Isolation for Efficient Stateful Serverless Computing
USENIX via YouTube
AC-Key - Adaptive Caching for LSM-based Key-Value Stores
USENIX via YouTube
The Future of the Past - Challenges in Archival Storage
USENIX via YouTube
A Decentralized Blockchain with High Throughput and Fast Confirmation
USENIX via YouTube