YoVDO

A Deep Dive on Supporting Multi-Instance GPUs in Containers and Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Conference Talks Courses Kubernetes Courses Containers Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a comprehensive conference talk on supporting Multi-Instance GPUs (MIG) in containers and Kubernetes. Dive deep into the technical aspects of implementing MIG support, a feature of NVIDIA Ampere GPUs that allows partitioning a GPU into smaller "MIG Devices". Learn about the integration of MIG with containers, the challenges faced in building Kubernetes support, and how to utilize this technology. Discover the open-source solutions developed by NVIDIA, including the container toolkit stack and k8s-device-plugin. Gain insights into best practices for distributing MIG devices across a Kubernetes cluster and managing their lifecycle on nodes. The presentation covers topics such as GPU scaling in Kubernetes, the NVIDIA Container Toolkit, GPU allocation to pods, system-level interfaces for MIG, and challenges in MIG partitioning. Conclude with a summary of key points and future implications for GPU virtualization in cloud-native environments.

Syllabus

Intro
GPUS AND KUBERNETES Seamlessly scale up training and inference to a cluster of GPU machines
WHAT ARE MULTI-INSTANCE GPUs? Slices of a full GPU with dedicated memory and compute resources
OUTLINE
MULTI-INSTANCE GPUs (MIG)
GPUS AND CONTAINERS The NVIDIA Container Toolkit
GPUS AND KUBERNETES Allocate GPUs to pods in a Kubernetes Cluster
MIG IN CONTAINERS AND KUBERNETES
SYSTEM LEVEL INTERFACE FOR MIG
CHALLENGES WITH MIG PARTITIONING How do I create a MG Device in the first place?
MIG PARTITION EDITOR
SUMMARY AND CONCLUSION


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Building Geospatial Apps on Postgres, PostGIS, & Citus at Large Scale
Microsoft via YouTube
Unlocking the Power of ML for Your JavaScript Applications with TensorFlow.js
TensorFlow via YouTube
Managing the Reactive World with RxJava - Jake Wharton
ChariotSolutions via YouTube
What's New in Grails 2.0
ChariotSolutions via YouTube
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks via YouTube