Finetuning, Serving, and Evaluating Large Language Models in the Wild

Offered By: Open Data Science via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Dive into the world of large language models with this 29-minute conference talk by Dr. Hao Zhang, postdoctoral researcher at UC Berkeley's Sky Lab. Explore hands-on experiences with serving and evaluating over 20 LLM-based chatbots, including Vicuna, within the innovative Chatbot Arena. Gain insights into the open-source chatbot Vicuna, finely tuned from Meta's Llama, and discover the Chatbot Arena platform for real-world model evaluations. Uncover the challenges of serving numerous LLMs, achieving high throughput, and ensuring low latency with limited resources. Learn about key enabling techniques like paged attention (vLLM) and statistical multiplexing with model parallelism (AlpaServe), developed in collaboration with the LMSYS Org team. Understand the efficient memory management for LLM inference, vLLM's memory efficiency, and its open-source adoption.

Syllabus

Welcome to the world of the world of large language models with Dr. Hao Zhang postdoctoral researcher at the Sky Lab, UC Berkeley. In this talk, Finetuning, Serving, and Evaluating LLMs in the Wild, Hao shares his hands-on experience with serving and evaluating over 20 LLM-based Chatbots, including Vicuna, within the innovative Chatbot Arena. In this video, you’ll get a deep dive into Vicuna, an open-source chatbot finely tuned from Meta's Llama, and explore the Chatbot Arena platform designed for real-world model evaluations. Discover the challenges behind serving numerous LLMs, achieving high throughput, and ensuring low latency with limited university-donated GPUs. Hao unveils the key enabling techniques, including paged attention vLLM, SOSP’23 and statistical multiplexing with model parallelism AlpaServe, OSDI’23, in collaboration with the LMSYS Org team at https://lmsys.org.
- Introductions
- Background
- An Example
- Chatbot Arena: Deployment & Elo-based Leaderboard
- Today’s Focus: Behind the Scene
- Key Insight
- vLLM: Efficient Memory Management for LLM Inference
- Memory Efficiency of vLLM
- vLLM Open-Source Adoption
- Key Idea

Taught by

Open Data Science

Finetuning, Serving, and Evaluating Large Language Models in the Wild

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Finetuning, Serving, and Evaluating Large Language Models in the Wild

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue