Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore the intricacies of deploying and orchestrating large open-source inference models on Kubernetes in this 27-minute conference talk from CNCF. Dive into automating the deployment of heavyweight models like Falcon and Llama 2 using Kubernetes Custom Resource Definitions (CRDs) to manage large model files through container images. Learn about streamlining deployment with an HTTP server for inference calls and eliminating manual tuning of deployment parameters with preset configurations. Discover techniques for auto-provisioning GPU nodes based on specific model requirements and empowering users to deploy containerized models effortlessly. Gain insights into dynamic creation of deployment workloads utilizing all GPU nodes through a controller-based approach.
Syllabus
Effortless Scalability: Orchestrating Large Language Model Inference...- Joinal Ahmed & Nirav Kumar
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Financial Sustainability: The Numbers side of Social Enterprise+Acumen via NovoEd Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms