YoVDO

Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Serverless Computing Courses Kubernetes Courses RDMA Courses KServe Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a conference talk on accelerating serverless AI large model inference through functionalized scheduling and RDMA technology. Dive into the challenges of deploying AI large models on standard serverless inference platforms like KServe, including scheduling inefficiencies and communication bottlenecks. Learn about a highly elastic functionalized scheduling framework developed to achieve second-level scheduling for thousands of serverless AI large model inference task instances. Discover how RDMA technology is leveraged to enable high-speed KV cache migration, overcoming the limitations of traditional network protocol stacks. Gain insights into improving resource utilization, reducing costs, and meeting low-latency and high-throughput demands in AI large model inference deployments.

Syllabus

Accelerating Serverless AI Large Model Inference with Functionalized... - Yiming Li & Chenglong Wang


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Introduction to Cloud Infrastructure Technologies
Linux Foundation via edX
Scalable Microservices with Kubernetes
Google via Udacity
Google Cloud Fundamentals: Core Infrastructure
Google via Coursera
Introduction to Kubernetes
Linux Foundation via edX
Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX