YoVDO

USHER: Holistic Interference Avoidance for Resource Optimized ML Inference

Offered By: USENIX via YouTube

Tags

Machine Learning Courses Scheduling Algorithms Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 15-minute conference talk from USENIX OSDI '24 that introduces USHER, a novel system for optimizing machine learning inference serving. Learn how USHER maximizes resource utilization while avoiding inter-model interference on GPUs. Discover the three key components of USHER: a fast GPU kernel-based model resource estimator, an interference-aware scheduler for optimizing batch size and model placement, and an operator graph merger to minimize GPU cache interference. Understand how USHER achieves significantly higher goodput and cost-efficiency compared to existing methods, with the ability to scale to thousands of GPUs. Gain insights into techniques for minimizing monetary costs and maximizing performance in deep learning inference systems.

Syllabus

OSDI '24 - USHER: Holistic Interference Avoidance for Resource Optimized ML Inference


Taught by

USENIX

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent