Investigating Checkpoint and Restore for GPU-Accelerated Containers
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore the potential of Checkpoint and Restore technology for GPU-accelerated containers in this 39-minute conference talk presented by Nan Lu from Microsoft and Adrian Reber from Red Hat. Delve into the early investigations and proof-of-concepts surrounding this nascent technology, aimed at optimizing the use of costly GPUs and time-intensive model training processes. Gain insights into existing functionalities and identify gaps in the ecosystem that need to be addressed to enable this solution. Learn about the challenges and opportunities in leveraging Checkpoint and Restore techniques for GPU-powered containers, and understand how this approach could potentially revolutionize resource management in high-performance computing environments.
Syllabus
Investigating Checkpoint and Restore for GPU-Accelerated Containers - Nan Lu & Adrian Reber
Taught by
Linux Foundation
Tags
Related Courses
High Performance ComputingGeorgia Institute of Technology via Udacity Введение в параллельное программирование с использованием OpenMP и MPI
Tomsk State University via Coursera High Performance Computing in the Cloud
Dublin City University via FutureLearn Production Machine Learning Systems
Google Cloud via Coursera LAFF-On Programming for High Performance
The University of Texas at Austin via edX