vLLM on Kubernetes in Production - Deployment and Cost-Saving Strategies
Offered By: Kubesimplify via YouTube
Course Description
Overview
Explore the fundamentals of vLLM, a fast and user-friendly library for LLM inference and serving, in this 28-minute video tutorial. Learn how to run vLLM locally and deploy it on Kubernetes in a production environment with GPU-attached nodes using a DaemonSet. Follow along with a hands-on demonstration that guides you through the process of implementing vLLM in a production setting. Gain insights from a real-world case study on cost-effective deployment of open-source AI technologies, as detailed in the accompanying blog post. Presented by John McBride, this Kubesimplify tutorial offers practical knowledge for efficiently leveraging vLLM on Kubernetes.
Syllabus
vLLM on Kubernetes in Production
Taught by
Kubesimplify
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent