YoVDO

LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU

Offered By: Linux Foundation via YouTube

Tags

Machine Learning Courses Docker Courses Kubernetes Courses LoRA (Low-Rank Adaptation) Courses Batch Processing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the innovative LoRAX (LoRA eXchange) LLM inference system in this informative conference talk. Learn how LoRAX enables users to pack thousands of fine-tuned "LoRA" adapters into a single GPU, significantly reducing serving costs compared to dedicated deployments per fine-tuned model. Discover the key features of this open-source, commercially free, and production-ready system, including pre-built docker images and Helm charts. Delve into the core concepts that make LoRAX the most cost-effective and efficient solution for serving fine-tuned LLMs in production, such as Dynamic Adapter Loading, Heterogeneous Continuous Batching, and Adapter Exchange Scheduling. Gain insights into how these techniques optimize latency, throughput, and resource utilization while managing multiple concurrent adapters on a single GPU.

Syllabus

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.


Taught by

Linux Foundation

Tags

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent