Increasing GPU Utilization on Kubernetes Clusters for AI/ML Workloads
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore strategies for optimizing GPU utilization in large-scale Kubernetes clusters dedicated to AI/ML workloads in this informative conference talk. Learn how to maximize the efficiency of 10,000 A100 GPUs across 20 on-premises Kubernetes clusters through various open-source solutions. Discover hardware-level optimizations like NVIDIA MIG, scheduler improvements with Volcano, application-layer enhancements using PaddlePaddle for smarter training job distribution, and multi-cluster management with Armada. Gain valuable insights into pitfalls, best practices, and recommendations based on real-world experiences from four large-scale projects completed in Q4 2023. Enhance your understanding of complex GPU optimization setups and their practical implementation in AI/ML environments.
Syllabus
Increasing GPU Utilisation on K8s Clusters Dedicated for AI/ML Workloads
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Red Hat Certified Specialist in OpenShift 4.2 Administration Exam Prep (ex280)A Cloud Guru Securing, Monitoring, and Scaling Kubernetes Clusters
LearnQuest via Coursera Best practice per il data warehousing con Amazon Redshift (Italiano) | Best Practices for Data Warehousing with Amazon Redshift (Italian)
Amazon Web Services via AWS Skill Builder Best practice per il data warehousing con Amazon Redshift (Italiano) | Best Practices for Data Warehousing with Amazon Redshift (Italian)
Amazon Web Services via AWS Skill Builder Best Practices for Data Warehousing with Amazon Redshift (French)
Amazon Web Services via AWS Skill Builder