YoVDO

Preemption Chaos and Optimizing Server Startup for LLMs in Production - Part 2

Offered By: MLOps.community via YouTube

Tags

Cloud Cost Optimization Courses Kubernetes Courses MLOps Courses GPU Computing Courses High Availability Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how Replit optimized their GPU-enabled infrastructure for serving Large Language Models in production during this 13-minute lightning talk from the LLMs in Prod Conference. Learn about the challenges and solutions involved in switching to preemptible GKE nodes, managing the resulting chaos, and achieving significant cost savings while improving uptime. Gain insights from Bradley Heilbrun, a Replit engineer with extensive experience in reliable and scalable LLM infrastructure, as he shares the story of how his team successfully implemented these changes. Explore strategies for balancing cost-effectiveness and high availability in cloud-based LLM services, drawing from Heilbrun's background as YouTube's first SRE and his experience at Google and PayPal.

Syllabus

Preemption Chaos and Optimizing Server Startup // Bradley Heilbrun // LLMs in Prod Conference Part 2


Taught by

MLOps.community

Related Courses

Моделирование биологических молекул на GPU (Biomolecular modeling on GPU)
Moscow Institute of Physics and Technology via Coursera
Practical Deep Learning For Coders
fast.ai via Independent
GPU Architectures And Programming
Indian Institute of Technology, Kharagpur via Swayam
Perform Real-Time Object Detection with YOLOv3
Coursera Project Network via Coursera
Getting Started with PyTorch
Coursera Project Network via Coursera