YoVDO

Increasing GPU Utilization on Kubernetes Clusters for AI/ML Workloads

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Cluster Scaling Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore strategies for optimizing GPU utilization in large-scale Kubernetes clusters dedicated to AI/ML workloads in this informative conference talk. Learn how to maximize the efficiency of 10,000 A100 GPUs across 20 on-premises Kubernetes clusters through various open-source solutions. Discover hardware-level optimizations like NVIDIA MIG, scheduler improvements with Volcano, application-layer enhancements using PaddlePaddle for smarter training job distribution, and multi-cluster management with Armada. Gain valuable insights into pitfalls, best practices, and recommendations based on real-world experiences from four large-scale projects completed in Q4 2023. Enhance your understanding of complex GPU optimization setups and their practical implementation in AI/ML environments.

Syllabus

Increasing GPU Utilisation on K8s Clusters Dedicated for AI/ML Workloads


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Red Hat Certified Specialist in OpenShift 4.2 Administration Exam Prep (ex280)
A Cloud Guru
Securing, Monitoring, and Scaling Kubernetes Clusters
LearnQuest via Coursera
Best practice per il data warehousing con Amazon Redshift (Italiano) | Best Practices for Data Warehousing with Amazon Redshift (Italian)
Amazon Web Services via AWS Skill Builder
Best practice per il data warehousing con Amazon Redshift (Italiano) | Best Practices for Data Warehousing with Amazon Redshift (Italian)
Amazon Web Services via AWS Skill Builder
Best Practices for Data Warehousing with Amazon Redshift (French)
Amazon Web Services via AWS Skill Builder