Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences
Offered By: USENIX via YouTube
Course Description
Overview
Explore a groundbreaking conference talk from OSDI '22 that introduces REEF, a revolutionary GPU-accelerated DNN inference serving system. Discover how REEF enables microsecond-scale kernel preemption and controlled concurrent execution in GPU scheduling, addressing the challenges of running both latency-critical and best-effort DNN inference tasks on GPUs. Learn about the innovative reset-based preemption scheme and dynamic kernel padding mechanism that allow REEF to achieve microsecond-scale preemption and maximize GPU utilization. Examine the evaluation results using a new DNN inference serving benchmark (DISB) and real-world trace, demonstrating REEF's ability to maintain low latency for real-time tasks while significantly increasing overall throughput. Gain insights into the potential applications of this technology in intelligent systems such as autonomous driving and virtual reality, and understand its implications for improving GPU scheduling efficiency in various domains.
Syllabus
OSDI '22 - Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences
Taught by
USENIX
Related Courses
GraphX - Graph Processing in a Distributed Dataflow FrameworkUSENIX via YouTube Theseus - An Experiment in Operating System Structure and State Management
USENIX via YouTube RedLeaf - Isolation and Communication in a Safe Operating System
USENIX via YouTube Microsecond Consensus for Microsecond Applications
USENIX via YouTube KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube