LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore the innovative LoRAX (LoRA eXchange) LLM inference system in this informative conference talk. Learn how LoRAX enables users to pack thousands of fine-tuned "LoRA" adapters into a single GPU, significantly reducing serving costs compared to dedicated deployments per fine-tuned model. Discover the key features of this open-source, commercially free, and production-ready system, including pre-built docker images and Helm charts. Delve into the core concepts that make LoRAX the most cost-effective and efficient solution for serving fine-tuned LLMs in production, such as Dynamic Adapter Loading, Heterogeneous Continuous Batching, and Adapter Exchange Scheduling. Gain insights into how these techniques optimize latency, throughput, and resource utilization while managing multiple concurrent adapters on a single GPU.
Syllabus
LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.
Taught by
Linux Foundation
Tags
Related Courses
Cloud Computing Applications, Part 1: Cloud Systems and InfrastructureUniversity of Illinois at Urbana-Champaign via Coursera Introduction to Cloud Infrastructure Technologies
Linux Foundation via edX Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms The Docker for DevOps course: From development to production
Udemy Windows Server 2016: Virtualization
Microsoft via edX