Finetuning, Serving, and Evaluating Large Language Models in the Wild
Offered By: Open Data Science via YouTube
Course Description
Overview
Syllabus
Welcome to the world of the world of large language models with Dr. Hao Zhang postdoctoral researcher at the Sky Lab, UC Berkeley. In this talk, Finetuning, Serving, and Evaluating LLMs in the Wild, Hao shares his hands-on experience with serving and evaluating over 20 LLM-based Chatbots, including Vicuna, within the innovative Chatbot Arena. In this video, you’ll get a deep dive into Vicuna, an open-source chatbot finely tuned from Meta's Llama, and explore the Chatbot Arena platform designed for real-world model evaluations. Discover the challenges behind serving numerous LLMs, achieving high throughput, and ensuring low latency with limited university-donated GPUs. Hao unveils the key enabling techniques, including paged attention vLLM, SOSP’23 and statistical multiplexing with model parallelism AlpaServe, OSDI’23, in collaboration with the LMSYS Org team at https://lmsys.org.
- Introductions
- Background
- An Example
- Chatbot Arena: Deployment & Elo-based Leaderboard
- Today’s Focus: Behind the Scene
- Key Insight
- vLLM: Efficient Memory Management for LLM Inference
- Memory Efficiency of vLLM
- vLLM Open-Source Adoption
- Key Idea
Taught by
Open Data Science
Related Courses
Macroeconometric ForecastingInternational Monetary Fund via edX Machine Learning With Big Data
University of California, San Diego via Coursera Data Science at Scale - Capstone Project
University of Washington via Coursera Structural Equation Model and its Applications | 结构方程模型及其应用 (粤语)
The Chinese University of Hong Kong via Coursera Data Science in Action - Building a Predictive Churn Model
SAP Learning