YoVDO

Scalable Multi-Node AI Workloads in Multi-Tenant AI Clouds Using SDN K8s Networking

Offered By: Linux Foundation via YouTube

Tags

Kubernetes Courses GPU Computing Courses Open vSwitch Courses Multi-Tenancy Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for optimizing multi-node AI workloads in multi-tenant cloud environments in this 39-minute conference talk by Girish Moodalbail and Leonid Grossman from NVIDIA. Delve into the critical aspects of efficient bandwidth utilization, low latency, and minimal jitter in AI workloads to prevent GPU underutilization. Understand the importance of network isolation and resource management in AI Cloud infrastructure to accommodate multiple users and concurrent workloads. Learn about innovative approaches to achieve network isolation through overlay virtual network topology and efficient bandwidth allocation using end-to-end QoS. Discover how Open Source SDN solutions, including Open vSwitch (OVS), Open Virtual Network (OVN), and OVN-Kubernetes CNI, can be leveraged to address these challenges. Gain insights into the significant performance improvements that can be achieved with OVS-offloadable hardware in scalable multi-node AI workload scenarios.

Syllabus

Scalable Multi-Node AI Workloads in Multi-Tenant AI Clouds U...- Girish Moodalbail & Leonid Grossman


Taught by

Linux Foundation

Tags

Related Courses

Introduction to Cloud Infrastructure Technologies
Linux Foundation via edX
Scalable Microservices with Kubernetes
Google via Udacity
Google Cloud Fundamentals: Core Infrastructure
Google via Coursera
Introduction to Kubernetes
Linux Foundation via edX
Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX