Overcoming Challenges in Serving Large Language Models - SREcon23 Europe/Middle East/Africa
Offered By: USENIX via YouTube
Course Description
Overview
Explore the intricacies of hosting GPT-type models in a Kubernetes cluster with multi-GPU nodes in this 31-minute conference talk from SREcon23 Europe/Middle East/Africa. Delve into the challenges SREs face when providing custom GPT model capabilities within organizations, including managing large model sizes, implementing GPU sharding, and utilizing tensor parallelism. Learn about various model file formats, quantization techniques, and the benefits of open-source tools like Huggingface Accelerate. Gain valuable insights into balancing serving latency, prediction accuracy, and distributed serving, while discovering best practices for optimizing resource allocation. Watch a live demonstration showcasing the performance and trade-offs of a GPT-based model, equipping you with practical knowledge to effectively host and manage large language models in your own environment.
Syllabus
SREcon23 Europe/Middle East/Africa - Overcoming Challenges in Serving Large Language Model
Taught by
USENIX
Related Courses
Моделирование биологических молекул на GPU (Biomolecular modeling on GPU)Moscow Institute of Physics and Technology via Coursera Practical Deep Learning For Coders
fast.ai via Independent GPU Architectures And Programming
Indian Institute of Technology, Kharagpur via Swayam Perform Real-Time Object Detection with YOLOv3
Coursera Project Network via Coursera Getting Started with PyTorch
Coursera Project Network via Coursera