YoVDO

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Scalability Courses Model Deployment Courses Containerization Courses Cloud Native Computing Courses Custom Resource Definitions Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 23-minute conference talk from CNCF. Discover how to automate the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) for seamless management of large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls, eliminating manual tuning of deployment parameters, and auto-provisioning GPU nodes based on specific model requirements. Gain insights into empowering users to deploy containerized models effortlessly by providing pod templates in the workspace custom resource inference field. Understand how the controller dynamically creates deployment workloads utilizing all GPU nodes, ensuring optimal resource utilization in the AI/ML landscape.

Syllabus

Effortless Scalability: Orchestrating Large Language Model Inference... Rohit Ghumare & Joinal Ahmed


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX
Configuration Management for Containerized Delivery
Microsoft via edX
Getting Started with Google Kubernetes Engine - Español
Google Cloud via Coursera
Getting Started with Google Kubernetes Engine - 日本語版
Google Cloud via Coursera
Architecting with Google Kubernetes Engine: Foundations en Español
Google Cloud via Coursera