YoVDO

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Scalability Courses Model Deployment Courses Containerization Courses Cloud Native Computing Courses Custom Resource Definitions Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 23-minute conference talk from the Cloud Native Computing Foundation (CNCF). Dive into the automation of heavyweight model deployments like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs). Learn how to manage large model files through container images and streamline deployment with an HTTP server for inference calls. Discover techniques for eliminating manual tuning of deployment parameters, auto-provisioning GPU nodes based on model requirements, and empowering users to deploy containerized models effortlessly. Gain insights into dynamic deployment workload creation utilizing all GPU nodes and optimizing resource utilization in the AI/ML landscape.

Syllabus

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Kubernetes: Cloud Native Ecosystem
LinkedIn Learning
Kubernetes: Cloud Native Ecosystem
LinkedIn Learning
Cloud Native Certified Kubernetes Administrator (CKA) (Legacy)
A Cloud Guru
Implement Resiliency in a Cloud-Native ASP.NET Core Microservice
Microsoft via YouTube
Open Networking & Edge Executive Forum 2021 - Day 1 Part 2 Sessions
Linux Foundation via YouTube