YoVDO

vLLM on Kubernetes in Production - Deployment and Cost-Saving Strategies

Offered By: Kubesimplify via YouTube

Tags

vLLM Courses Machine Learning Courses Cloud Computing Courses Kubernetes Courses GPU Computing Courses Scalability Courses Containerization Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the fundamentals of vLLM, a fast and user-friendly library for LLM inference and serving, in this 28-minute video tutorial. Learn how to run vLLM locally and deploy it on Kubernetes in a production environment with GPU-attached nodes using a DaemonSet. Follow along with a hands-on demonstration that guides you through the process of implementing vLLM in a production setting. Gain insights from a real-world case study on cost-effective deployment of open-source AI technologies, as detailed in the accompanying blog post. Presented by John McBride, this Kubesimplify tutorial offers practical knowledge for efficiently leveraging vLLM on Kubernetes.

Syllabus

vLLM on Kubernetes in Production


Taught by

Kubesimplify

Related Courses

Finetuning, Serving, and Evaluating Large Language Models in the Wild
Open Data Science via YouTube
Cloud Native Sustainable LLM Inference in Action
CNCF [Cloud Native Computing Foundation] via YouTube
Optimizing Kubernetes Cluster Scaling for Advanced Generative Models
Linux Foundation via YouTube
LLaMa for Developers
LinkedIn Learning
Scaling Video Ad Classification Across Millions of Classes with GenAI
Databricks via YouTube