Empower Large Language Models Serving in Production with Cloud Native AI Technologies
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore the challenges and solutions for deploying Large Language Models (LLMs) in production environments using Cloud Native AI technologies. Learn how to optimize LLM serving by extending KServe to handle OpenAI's streaming requests, reducing model loading time with Fluid and Vineyard, and implementing cost-effective auto-scaling strategies. Gain insights from KServe and Fluid maintainers on overcoming production challenges, and discover practical techniques for balancing performance and cost in LLM deployments. Understand the importance of timed auto-scaling with cronHPA and evaluate the cost-effectiveness of scaling processes. Benefit from real-world experiences and best practices for effectively utilizing Cloud Native AI in production environments.
Syllabus
Empower Large Language Models (LLMs) Serving in Production with Cloud Native...- Lize Cai & Yang Che
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Serverless Machine Learning Model Inference on Kubernetes with KServeDevoxx via YouTube Machine Learning in Fastly's Compute@Edge
Linux Foundation via YouTube ModelMesh: Scalable AI Model Serving on Kubernetes
Linux Foundation via YouTube MLSecOps - Automated Online and Offline ML Model Evaluations on Kubernetes
Linux Foundation via YouTube Creating a Custom Serving Runtime in KServe ModelMesh - Hands-On Experience
Linux Foundation via YouTube