YoVDO

On-Demand Systems and Scaled Training Using the JobSet API

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Cloud Native Computing Courses Machine Learning Courses PyTorch Courses High Performance Computing Courses Scalability Courses TPUs Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the JobSet API for orchestrating complex workflows in ephemeral environments through this informative conference talk. Discover how to efficiently manage large-scale machine learning model training and build on-demand HPC systems using this powerful tool. Learn about automating the setup of training workloads with common frameworks like PyTorch and see results from large-scale experiments utilizing thousands of TPU chips. Gain insights into streamlining the process of creating on-demand HPC systems and establishing standardized environments for experimental comparisons. Understand how the JobSet API addresses challenges in job orchestration, ensuring scalability and high resource utilization for heterogeneous components in cloud-native computing environments.

Syllabus

On-Demand Systems and Scaled Training Using the JobSet API - Abdullah Gharaibeh & Vanessa Sochat


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

High Performance Computing
Georgia Institute of Technology via Udacity
Введение в параллельное программирование с использованием OpenMP и MPI
Tomsk State University via Coursera
High Performance Computing in the Cloud
Dublin City University via FutureLearn
Production Machine Learning Systems
Google Cloud via Coursera
LAFF-On Programming for High Performance
The University of Texas at Austin via edX