YoVDO

Serve a Custom LLM for Over 100 Customers - GPU Selection, Quantization, and API Setup

Offered By: Trelis Research via YouTube

Tags

API Development Courses Quantization Courses vLLM Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to serve a custom Large Language Model (LLM) for over 100 customers in this comprehensive 52-minute video tutorial. Explore key topics including server selection, API software choices, and GPU optimization techniques. Discover one-click templates for easy implementation and gain insights on using quantization to maximize GPU efficiency. Follow along with a step-by-step Vast.ai setup and learn to serve Mistral with vLLM and AWQ, handling concurrent requests. Delve into function calling models and conduct API speed tests. Master the skills needed to efficiently deploy and manage LLMs for multiple users in this informative and practical guide.

Syllabus

Serving a model for 100 customers
Video Overview
Choosing a server
Choosing software to serve an API
One-click templates
Tips on GPU selection.
Using quantisation to fit in a cheaper GPU
Vast.ai setup
Serve Mistral with vLLM and AWQ, incl. concurrent requests
Serving a function calling model
API speed tests, including concurrent
Video Recap


Taught by

Trelis Research

Related Courses

Finetuning, Serving, and Evaluating Large Language Models in the Wild
Open Data Science via YouTube
Cloud Native Sustainable LLM Inference in Action
CNCF [Cloud Native Computing Foundation] via YouTube
Optimizing Kubernetes Cluster Scaling for Advanced Generative Models
Linux Foundation via YouTube
LLaMa for Developers
LinkedIn Learning
Scaling Video Ad Classification Across Millions of Classes with GenAI
Databricks via YouTube