LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU

Offered By: Linux Foundation via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore the innovative LoRAX (LoRA eXchange) LLM inference system in this informative conference talk. Learn how LoRAX enables users to pack thousands of fine-tuned "LoRA" adapters into a single GPU, significantly reducing serving costs compared to dedicated deployments per fine-tuned model. Discover the key features of this open-source, commercially free, and production-ready system, including pre-built docker images and Helm charts. Delve into the core concepts that make LoRAX the most cost-effective and efficient solution for serving fine-tuned LLMs in production, such as Dynamic Adapter Loading, Heterogeneous Continuous Batching, and Adapter Exchange Scheduling. Gain insights into how these techniques optimize latency, throughput, and resource utilization while managing multiple concurrent adapters on a single GPU.

Syllabus

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.

Taught by

Linux Foundation

LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Login to Continue