YoVDO

Large Model Training and Inference with DeepSpeed

Offered By: MLOps.community via YouTube

Tags

PyTorch Courses Distributed Systems Courses Model Training Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the journey of DeepSpeed and its transformative impact on large model training and inference in this 36-minute conference talk by Samyam Rajbhandari at the LLMs in Prod Conference. Discover how technologies like ZeRO and 3D-Parallelism have become fundamental building blocks for training large language models at scale, powering LLMs such as Bloom-176B and Megatron-Turing 530B. Learn about heterogeneous memory training systems like ZeRO-Offload and ZeRO-Infinity, which have democratized LLMs by making them accessible with limited resources. Gain insights into DeepSpeed-Inference and DeepSpeed-MII, which simplify the application of powerful inference optimizations to accelerate LLMs for deployment. Understand how DeepSpeed has been integrated into platforms like HuggingFace, PyTorch Lightning, and Mosaic ML, and how its technologies are offered in PyTorch, Colossal-AI, and Megatron-LM. Delve into the motivations, insights, and stories behind the development of these groundbreaking technologies that have revolutionized large language model training and inference.

Syllabus

Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference


Taught by

MLOps.community

Related Courses

Advanced Operating Systems
Georgia Institute of Technology via Udacity
High Performance Computing
Georgia Institute of Technology via Udacity
GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity
Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX
CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX