A Deep Dive on Supporting Multi-Instance GPUs in Containers and Kubernetes
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore a comprehensive conference talk on supporting Multi-Instance GPUs (MIG) in containers and Kubernetes. Dive deep into the technical aspects of implementing MIG support, a feature of NVIDIA Ampere GPUs that allows partitioning a GPU into smaller "MIG Devices". Learn about the integration of MIG with containers, the challenges faced in building Kubernetes support, and how to utilize this technology. Discover the open-source solutions developed by NVIDIA, including the container toolkit stack and k8s-device-plugin. Gain insights into best practices for distributing MIG devices across a Kubernetes cluster and managing their lifecycle on nodes. The presentation covers topics such as GPU scaling in Kubernetes, the NVIDIA Container Toolkit, GPU allocation to pods, system-level interfaces for MIG, and challenges in MIG partitioning. Conclude with a summary of key points and future implications for GPU virtualization in cloud-native environments.
Syllabus
Intro
GPUS AND KUBERNETES Seamlessly scale up training and inference to a cluster of GPU machines
WHAT ARE MULTI-INSTANCE GPUs? Slices of a full GPU with dedicated memory and compute resources
OUTLINE
MULTI-INSTANCE GPUs (MIG)
GPUS AND CONTAINERS The NVIDIA Container Toolkit
GPUS AND KUBERNETES Allocate GPUs to pods in a Kubernetes Cluster
MIG IN CONTAINERS AND KUBERNETES
SYSTEM LEVEL INTERFACE FOR MIG
CHALLENGES WITH MIG PARTITIONING How do I create a MG Device in the first place?
MIG PARTITION EDITOR
SUMMARY AND CONCLUSION
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Building Geospatial Apps on Postgres, PostGIS, & Citus at Large ScaleMicrosoft via YouTube Unlocking the Power of ML for Your JavaScript Applications with TensorFlow.js
TensorFlow via YouTube Managing the Reactive World with RxJava - Jake Wharton
ChariotSolutions via YouTube What's New in Grails 2.0
ChariotSolutions via YouTube Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks via YouTube