Serving DNNs like Clockwork - Performance Predictability from the Bottom Up
Offered By: USENIX via YouTube
Course Description
Overview
Explore a conference talk from OSDI '20 that delves into the performance predictability of serving Deep Neural Networks (DNNs). Learn about Clockwork, a distributed model serving system designed to achieve consistent low latency for machine learning inference in interactive web applications. Discover how the researchers leverage the deterministic performance of DNN inferences to build a system that can support thousands of models while meeting strict latency targets. Examine the principles behind Clockwork's design, its ability to achieve tight request-level service-level objectives (SLOs), and its high degree of request-level performance isolation. Gain insights into addressing common-case sources of latency and curtailing tail latency caused by unpredictable execution times in model serving architectures.
Syllabus
Introduction
High Tail Latencies
Predictable Worker
Clockwork
Clockwork Example
Conclusion
Taught by
USENIX
Related Courses
GraphX - Graph Processing in a Distributed Dataflow FrameworkUSENIX via YouTube Theseus - An Experiment in Operating System Structure and State Management
USENIX via YouTube RedLeaf - Isolation and Communication in a Safe Operating System
USENIX via YouTube Microsecond Consensus for Microsecond Applications
USENIX via YouTube KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube