YoVDO

Enabling HPC and ML Workloads with Latest Kubernetes Job Features

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Machine Learning Courses High Performance Computing Courses Distributed Computing Courses Batch Processing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the latest Kubernetes Job API features for running distributed Batch, AI, and HPC workloads at scale in this conference talk. Learn how Indexed Jobs simplify parallel workloads requiring pod-to-pod communication, with examples from DeepMind's distributed machine learning applications. Discover the Flux Operator's ability to orchestrate HPC workloads by creating a "Mini Cluster" within Kubernetes. Understand how Pod Failure Policy can maintain job execution despite pod disruptions while optimizing costs. Gain insights from real-world experiences at DeepMind and Lawrence Livermore National Laboratory to enhance your ability to manage complex computational workloads in Kubernetes environments.

Syllabus

Enabling HPC & ML Workloads with the Latest Kubernetes Job Features- Michał Woźniak & Vanessa Sochat


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Cloud Computing Concepts, Part 1
University of Illinois at Urbana-Champaign via Coursera
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera
Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms