YoVDO

Empower Large Language Models Serving in Production with Cloud Native AI Technologies

Offered By: Linux Foundation via YouTube

Tags

Auto-scaling Courses OpenAI Courses KServe Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for deploying Large Language Models (LLMs) in production environments using cloud native AI technologies. Learn how KServe has been extended to handle OpenAI's streaming requests, accommodating the inference load of LLMs. Discover how Fluid and Vineyard have optimized model loading times, reducing Llama-30B's loading from 10 minutes to under 25 seconds. Understand the importance of cronHPA for timed auto-scaling to balance cost and performance. Gain insights from KServe and Fluid reviewers and maintainers on overcoming production challenges, and learn effective strategies for utilizing cloud native AI in real-world scenarios.

Syllabus

Empower Large Language Models (LLMs) Serving in Production with Cloud Native... Lize Cai & Yang Che


Taught by

Linux Foundation

Tags

Related Courses

Serverless Machine Learning Model Inference on Kubernetes with KServe
Devoxx via YouTube
Machine Learning in Fastly's Compute@Edge
Linux Foundation via YouTube
ModelMesh: Scalable AI Model Serving on Kubernetes
Linux Foundation via YouTube
MLSecOps - Automated Online and Offline ML Model Evaluations on Kubernetes
Linux Foundation via YouTube
Creating a Custom Serving Runtime in KServe ModelMesh - Hands-On Experience
Linux Foundation via YouTube