Optimizing Large-Scale Model Training with Ray Compiled Graphs
Offered By: Anyscale via YouTube
Course Description
Overview
Explore advanced techniques for training large-scale models in this Ray Summit 2024 conference talk. Discover how Ray Core's latest features enhance training efficiency for LLMs and multimodal AI models. Learn about Ray's native GPU-GPU communication and pre-compiled execution paths, and their application in complex data and control flows for distributed model training. Gain insights into implementing pipeline parallelism and training multimodal models using heterogeneous GPUs. Compare Ray implementations with NCCL and PyTorch, focusing on simplicity and maintainability. Examine benchmarks on throughput and GPU utilization to understand the practical benefits of these optimizations. Ideal for ML researchers and engineers working on large-scale AI projects, this talk provides valuable knowledge on maximizing accelerator utilization and improving training efficiency in the era of increasingly large and complex models.
Syllabus
Optimizing Large-Scale Model Training with Ray Compiled Graphs | Ray Summit 2024
Taught by
Anyscale
Related Courses
Cloud Computing Concepts, Part 1University of Illinois at Urbana-Champaign via Coursera Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX Introduction to Apache Spark and AWS
University of London International Programmes via Coursera Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms