YoVDO

Large-Scale Distributed Training with TorchX and Ray

Offered By: Anyscale via YouTube

Tags

Machine Learning Courses Scalability Courses Infrastructure Management Courses Distributed Training Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how to launch elastic large-scale distributed training jobs using TorchX and Ray in this 31-minute conference talk from Anyscale. Learn about the collaborative efforts between the TorchX and Ray teams to overcome traditional challenges in distributed training, including job submission, status monitoring, log aggregation, and infrastructure integration. Explore the benefits of TorchX components for code reusability and experimentation with different training infrastructures, enabling seamless transitions from research to production without additional coding. Gain insights into this experimental project that simplifies the process of scaling distributed training entirely from a notebook environment.

Syllabus

Large-scale distributed training with TorchX and Ray


Taught by

Anyscale

Related Courses

Financial Sustainability: The Numbers side of Social Enterprise
+Acumen via NovoEd
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent
Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX
Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms