YoVDO

Maximizing GPU Utilization Over Multi-Cluster - Challenges and Solutions for Cloud-Native AI Platform

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Kubernetes Courses Volcano Courses Karmada Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for maximizing GPU utilization in multi-cluster environments for cloud-native AI platforms in this 27-minute conference talk by William Wang and Hongcai Ren from Huawei. Delve into the complexities of managing large-scale, heterogeneous GPU environments across multiple Kubernetes clusters and data centers. Learn about innovative approaches to address resource fragmentation, operational costs, and cross-cluster workload scheduling using tools like Karmada and Volcano. Discover strategies for intelligent GPU workload scheduling, ensuring cluster failover support, maintaining two-level scheduling consistency, and balancing utilization with Quality of Service (QoS) for workloads with varying priorities. Gain valuable insights into optimizing AI/ML workloads on Kubernetes and enhancing the efficiency of cloud-native AI platforms.

Syllabus

Maximizing GPU Utilization Over Multi-Cluster: Challenges and Solutions for Cloud-Native AI Platform


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Karmada Cross-Cluster Elastic Scaling: Scenarios and Implementation Analysis
CNCF [Cloud Native Computing Foundation] via YouTube
Simplifying Multi-cluster Kubernetes Management with Karmada
CNCF [Cloud Native Computing Foundation] via YouTube
Managing Multi-Cluster with Karmada - Session 6
CNCF [Cloud Native Computing Foundation] via YouTube
Karmada and ErieCanal Multi-Cluster Scheduling - Session 4
CNCF [Cloud Native Computing Foundation] via YouTube
Sailing Multi-Cloud Traffic Management with Karmada
CNCF [Cloud Native Computing Foundation] via YouTube