Large-Scale Distributed Training with TorchX and Ray
Offered By: Anyscale via YouTube
Course Description
Overview
Discover how to launch elastic large-scale distributed training jobs using TorchX and Ray in this 31-minute conference talk from Anyscale. Learn about the collaborative efforts between the TorchX and Ray teams to overcome traditional challenges in distributed training, including job submission, status monitoring, log aggregation, and infrastructure integration. Explore the benefits of TorchX components for code reusability and experimentation with different training infrastructures, enabling seamless transitions from research to production without additional coding. Gain insights into this experimental project that simplifies the process of scaling distributed training entirely from a notebook environment.
Syllabus
Large-scale distributed training with TorchX and Ray
Taught by
Anyscale
Related Courses
Custom and Distributed Training with TensorFlowDeepLearning.AI via Coursera Architecting Production-ready ML Models Using Google Cloud ML Engine
Pluralsight Building End-to-end Machine Learning Workflows with Kubeflow
Pluralsight Deploying PyTorch Models in Production: PyTorch Playbook
Pluralsight Inside TensorFlow
TensorFlow via YouTube