Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 23-minute conference talk from the Cloud Native Computing Foundation (CNCF). Dive into the automation of heavyweight model deployments like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs). Learn how to manage large model files through container images and streamline deployment with an HTTP server for inference calls. Discover techniques for eliminating manual tuning of deployment parameters, auto-provisioning GPU nodes based on model requirements, and empowering users to deploy containerized models effortlessly. Gain insights into dynamic deployment workload creation utilizing all GPU nodes and optimizing resource utilization in the AI/ML landscape.
Syllabus
Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Developing a Tabular Data ModelMicrosoft via edX Data Science in Action - Building a Predictive Churn Model
SAP Learning Serverless Machine Learning with Tensorflow on Google Cloud Platform 日本語版
Google Cloud via Coursera Intro to TensorFlow em Português Brasileiro
Google Cloud via Coursera Serverless Machine Learning con TensorFlow en GCP
Google Cloud via Coursera