USHER: Holistic Interference Avoidance for Resource Optimized ML Inference
Offered By: USENIX via YouTube
Course Description
Overview
Explore a 15-minute conference talk from USENIX OSDI '24 that introduces USHER, a novel system for optimizing machine learning inference serving. Learn how USHER maximizes resource utilization while avoiding inter-model interference on GPUs. Discover the three key components of USHER: a fast GPU kernel-based model resource estimator, an interference-aware scheduler for optimizing batch size and model placement, and an operator graph merger to minimize GPU cache interference. Understand how USHER achieves significantly higher goodput and cost-efficiency compared to existing methods, with the ability to scale to thousands of GPUs. Gain insights into techniques for minimizing monetary costs and maximizing performance in deep learning inference systems.
Syllabus
OSDI '24 - USHER: Holistic Interference Avoidance for Resource Optimized ML Inference
Taught by
USENIX
Related Courses
Real Time Operating SystemIndian Institute of Technology, Kharagpur via Swayam Build Your Own RealTime OS (RTOS) From Ground Up™ on ARM 1
Udemy Real-Time Systems
NPTEL via YouTube Embedded and Real Time Operating Systems
5 Minutes Engineering via YouTube RA: Supply Chain Applications with R & Shiny: Inventory.
Udemy