YoVDO

dLoRA - Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving

Offered By: USENIX via YouTube

Tags

LoRA (Low-Rank Adaptation) Courses Load Balancing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a cutting-edge conference talk on dLoRA, an innovative inference serving system for LoRA (Low-Rank Adaptation) models in large language model (LLM) serving. Delve into the dynamic orchestration of requests and LoRA adapters, focusing on two key aspects: dynamically merging and unmerging adapters with the base model, and migrating requests and adapters between worker replicas. Discover the insights behind these capabilities, including the impact of request skewness on adapter merging decisions and the load imbalance caused by varying input and output lengths in autoregressive LLM requests. Learn about the credit-based batching algorithm for merge/unmerge decisions and the request-adapter co-migration algorithm. Examine the impressive performance improvements achieved by dLoRA, with throughput increases of up to 57.9× and 26.0× compared to vLLM and HugginFace PEFT, respectively, and up to 1.8× lower average latency than the concurrent work S-LoRA.

Syllabus

OSDI '24 - dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving


Taught by

USENIX

Related Courses

How to Do Stable Diffusion LORA Training by Using Web UI on Different Models
Software Engineering Courses - SE Courses via YouTube
MicroPython & WiFi
Kevin McAleer via YouTube
Building a Wireless Community Sensor Network with LoRa
Hackaday via YouTube
ComfyUI - Node Based Stable Diffusion UI
Olivio Sarikas via YouTube
AI Masterclass for Everyone - Stable Diffusion, ControlNet, Depth Map, LORA, and VR
Hugh Hou via YouTube