YoVDO

Overcoming Challenges in Serving Large Language Models - SREcon23 Europe/Middle East/Africa

Offered By: USENIX via YouTube

Tags

Site Reliability Engineering (SRE) Courses Kubernetes Courses Quantization Courses GPU Computing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of hosting GPT-type models in a Kubernetes cluster with multi-GPU nodes in this 31-minute conference talk from SREcon23 Europe/Middle East/Africa. Delve into the challenges SREs face when providing custom GPT model capabilities within organizations, including managing large model sizes, implementing GPU sharding, and utilizing tensor parallelism. Learn about various model file formats, quantization techniques, and the benefits of open-source tools like Huggingface Accelerate. Gain valuable insights into balancing serving latency, prediction accuracy, and distributed serving, while discovering best practices for optimizing resource allocation. Watch a live demonstration showcasing the performance and trade-offs of a GPT-based model, equipping you with practical knowledge to effectively host and manage large language models in your own environment.

Syllabus

SREcon23 Europe/Middle East/Africa - Overcoming Challenges in Serving Large Language Model


Taught by

USENIX

Related Courses

Site Reliability Engineering: Measuring and Managing Reliability
Google Cloud via Coursera
Introduction to DevOps and Site Reliability Engineering
Linux Foundation via edX
Developing a Google SRE Culture
Google Cloud via Coursera
Preparing for Google Cloud Certification: Cloud DevOps Engineer
Google Cloud via Coursera
Organizational Change and Culture for Adopting Google Cloud
Google Cloud via Coursera