Empower Large Language Models Serving in Production with Cloud Native AI Technologies
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore the challenges and solutions for deploying Large Language Models (LLMs) in production environments using cloud native AI technologies. Learn how KServe has been extended to handle OpenAI's streaming requests, accommodating the inference load of LLMs. Discover how Fluid and Vineyard have optimized model loading times, reducing Llama-30B's loading from 10 minutes to under 25 seconds. Understand the importance of cronHPA for timed auto-scaling to balance cost and performance. Gain insights from KServe and Fluid reviewers and maintainers on overcoming production challenges, and learn effective strategies for utilizing cloud native AI in real-world scenarios.
Syllabus
Empower Large Language Models (LLMs) Serving in Production with Cloud Native... Lize Cai & Yang Che
Taught by
Linux Foundation
Tags
Related Courses
Data Engineering on Google Cloud Platform 日本語版Google Cloud via Coursera Cloud Computing Fundamentals on Alibaba Cloud
Alibaba Cloud Academy via Coursera Launch an auto-scaling AWS EC2 virtual machine
Coursera Project Network via Coursera Cloud Computing With Amazon Web Services
Udemy AWS Certified Solution Architect - Associate 2020
Udemy