Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore a conference talk on accelerating serverless AI large model inference through functionalized scheduling and RDMA technology. Dive into the challenges of deploying AI large models on standard serverless inference platforms like KServe, including scheduling inefficiencies and communication bottlenecks. Learn about a highly elastic functionalized scheduling framework developed to achieve second-level scheduling for thousands of serverless AI large model inference task instances. Discover how RDMA technology is leveraged to enable high-speed KV cache migration, overcoming the limitations of traditional network protocol stacks. Gain insights into improving resource utilization, reducing costs, and meeting low-latency and high-throughput demands in AI large model inference deployments.
Syllabus
Accelerating Serverless AI Large Model Inference with Functionalized... - Yiming Li & Chenglong Wang
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Introduction to Cloud Infrastructure TechnologiesLinux Foundation via edX Cloud Computing
Indian Institute of Technology, Kharagpur via Swayam Elastic Cloud Infrastructure: Containers and Services en Español
Google Cloud via Coursera Kyma – A Flexible Way to Connect and Extend Applications
SAP Learning Modernize Infrastructure and Applications with Google Cloud
Google Cloud via Coursera