YoVDO

Orca - A Distributed Serving System for Transformer-Based Generative Models

Offered By: USENIX via YouTube

Tags

OSDI (Operating Systems Design and Implementation) Courses Distributed Systems Courses Scheduling Algorithms Courses

Course Description

Overview

Explore a conference talk on Orca, a distributed serving system designed for Transformer-based generative models. Delve into the challenges of serving large-scale language models like GPT-3 and discover innovative solutions such as iteration-level scheduling and selective batching. Learn how these techniques significantly improve latency and throughput compared to existing systems. Gain insights into the architecture and scheduling mechanisms of Orca, which enable efficient processing of multi-iteration workloads for autoregressive token generation. Understand the importance of system support for serving cutting-edge generative AI models and how Orca addresses the limitations of current inference serving systems.

Syllabus

Intro
Generative Models
Inference of Generative Language M
Serving of Generative Language Mo
Problem 1: Request-Level Schedulin
Solution 1: Iteration-Level Schedulin
Problem 2: Batching
Solution 2: Selective Batching
Orca System Architecture
Scheduling


Taught by

USENIX

Related Courses

GraphX - Graph Processing in a Distributed Dataflow Framework
USENIX via YouTube
Theseus - An Experiment in Operating System Structure and State Management
USENIX via YouTube
RedLeaf - Isolation and Communication in a Safe Operating System
USENIX via YouTube
Microsecond Consensus for Microsecond Applications
USENIX via YouTube
KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube