YoVDO

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Inference Courses Scalability Courses Orchestration Courses Containerization Courses Custom Resource Definitions Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 27-minute conference talk from CNCF. Dive into automating the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) to manage large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls and eliminating manual tuning of deployment parameters with preset configurations. Discover techniques for auto-provisioning GPU nodes based on specific model requirements and empowering users to deploy containerized models effortlessly. Gain insights into dynamic creation of deployment workloads utilizing all GPU nodes through a controller-based approach.

Syllabus

Effortless Scalability: Orchestrating Large Language Model Inference...- Joinal Ahmed & Nirav Kumar


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

First Nights - Berlioz’s Symphonie Fantastique and Program Music in the 19th Century
Harvard University via edX
Azure Application Deployment and Management
Microsoft via edX
Building Modern Nodejs Applications on AWS
Amazon Web Services via edX
Implementation Strategies: Cloud Computing
The University of British Columbia via edX
Introducción a Contenedores con Docker y Kubernetes
IBM via Coursera