YoVDO

Fairness in Serving Large Language Models

Offered By: USENIX via YouTube

Tags

Fairness Courses Scheduling Algorithms Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 16-minute conference talk from USENIX's OSDI '24 program that delves into the challenges of ensuring fairness in serving Large Language Models (LLMs). Learn about the novel Virtual Token Counter (VTC) scheduling algorithm designed to address the unique challenges posed by LLM inference services. Discover how this approach improves upon traditional request rate limits by accounting for input and output tokens processed, leading to better resource utilization and client experience. Examine the proof of a 2× tight upper bound on service differences between backlogged clients and understand how VTC outperforms baseline methods in various conditions. Gain insights into the complexities of fair scheduling for LLMs, considering their unpredictable request lengths and batching characteristics on parallel accelerators. Access the reproducible code and dive deeper into this cutting-edge research on fairness in LLM serving.

Syllabus

OSDI '24 - Fairness in Serving Large Language Models


Taught by

USENIX

Related Courses

Real Time Operating System
Indian Institute of Technology, Kharagpur via Swayam
Build Your Own RealTime OS (RTOS) From Ground Up™ on ARM 1
Udemy
Real-Time Systems
NPTEL via YouTube
Embedded and Real Time Operating Systems
5 Minutes Engineering via YouTube
RA: Supply Chain Applications with R & Shiny: Inventory.
Udemy