Alpa: Simple Large Model Training and Inference on Ray
Offered By: Anyscale via YouTube
Course Description
Overview
Explore the capabilities of Alpa, a Ray-native library designed for automated training and serving of large models like GPT-3. Discover how Alpa simplifies model-parallel training of complex deep learning models by generating execution plans that unify data, operator, and pipeline parallelism. Learn about Alpa's innovative approach to distributing training across two hierarchical levels of parallelism: inter-operator and intra-operator. Understand how Alpa constructs a new hierarchical space for massive model-parallel execution plans and uses compilation passes to derive optimal parallel execution plans. Examine Alpa's efficient runtime that orchestrates two-level parallel execution on distributed compute devices. Compare Alpa's performance to hand-tuned model-parallel training systems and explore its versatility in handling models with heterogeneous architectures. Delve into both the algorithmic aspects and the engineering/system implementation, with a focus on Ray's crucial role as a building block of the Alpa runtime. This 31-minute talk from Anyscale at Ray Summit provides valuable insights into advanced techniques for scaling out complex deep learning models on distributed computing environments.
Syllabus
Alpa - Simple large model training and inference on Ray
Taught by
Anyscale
Related Courses
Cloud Computing Concepts, Part 1University of Illinois at Urbana-Champaign via Coursera Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX Introduction to Apache Spark and AWS
University of London International Programmes via Coursera Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms