Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Offered By: USENIX via YouTube

Course Description

Overview

Explore a groundbreaking conference talk from OSDI '22 that introduces REEF, a revolutionary GPU-accelerated DNN inference serving system. Discover how REEF enables microsecond-scale kernel preemption and controlled concurrent execution in GPU scheduling, addressing the challenges of running both latency-critical and best-effort DNN inference tasks on GPUs. Learn about the innovative reset-based preemption scheme and dynamic kernel padding mechanism that allow REEF to achieve microsecond-scale preemption and maximize GPU utilization. Examine the evaluation results using a new DNN inference serving benchmark (DISB) and real-world trace, demonstrating REEF's ability to maintain low latency for real-time tasks while significantly increasing overall throughput. Gain insights into the potential applications of this technology in intelligent systems such as autonomous driving and virtual reality, and understand its implications for improving GPU scheduling efficiency in various domains.

Syllabus

OSDI '22 - Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Taught by

USENIX

Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue