YoVDO

Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds

Offered By: Linux Foundation via YouTube

Tags

High Performance Computing Courses Data Science Courses Machine Learning Courses CUDA Courses MPI Courses Middleware Courses Distributed Computing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the design of high-performance scalable middleware for HPC, AI, and Data Science in exascale systems and clouds in this comprehensive conference talk. Delve into the challenges of supporting programming models for multi-petaflop and exaflop systems, and learn about the MVAPICH2 project's architecture and features. Discover performance improvements in startup, collectives, and applications using MVAPICH2 and TAU. Examine the benefits of new protocols and designs, including DC transport, cooperative rendezvous, and shared address space collectives. Investigate MVAPICH2-GDR's capabilities for HPC, deep learning, and data science, with a focus on CUDA-aware MPI support and on-the-fly compression. Analyze performance benchmarks for distributed TensorFlow, PyTorch, Horovod, and DeepSpeed at scale, as well as Dask architecture and cuDF merge operations. Gain insights into upcoming features and funding acknowledgments for cutting-edge middleware development.

Syllabus

Intro
Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges
Designing (MPX) Programming Models at Exascale
Overview of the MVAPICH2 Project
MVAPICH2 Release Timeline and Downloads
Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science
Highlights of MVAPICH2 2.3.6-GA Release
Startup Performance on TACC Frontera
Performance of Collectives with SHARP on TACC Frontera
Performance Engineering Applications using MVAPICH2 and TAU
Overview of Some of the MVAPICH2-X Features
Impact of DC Transport Protocol on Neuron
Cooperative Rendezvous Protocols
Benefits of the New Asynchronous Progress Design: Broadwell + InfiniBand
Shared Address Space (XPMEM)-based Collectives Design
MVAPICH2-GDR 2.3.6
Highlights of some MVAPICH2-GDR Features for HPC, DL, ML and Data Science
MVAPICH2-GDR with CUDA-aware MPI Support
Performance with On-the-fly Compression Support in MVAPICH2-GDR
Collectives Performance on DGX2-A100 - Small Message
MVAPICH2 (MPI)-driven Infrastructure for ML/DL Training
Distributed TensorFlow on ORNL Summit 1,536 GPUS
Distributed TensorFlow on TACC Frontera (2048 CPU nodes)
PyTorch, Horovod and DeepSpeed at Scale: Training ResNet-50 on 256 V100 GPUs
Dask Architecture
Benchmark #1: Sum of cupy Array and its Transpose (12)
Benchmark #2: cuDF Merge (TACC Frontera GPU Subsystem)
MVAPICH2-GDR Upcoming Features for HPC and DL
Funding Acknowledgments


Taught by

Linux Foundation

Tags

Related Courses

High Performance Computing
Georgia Institute of Technology via Udacity
Fundamentals of Accelerated Computing with CUDA C/C++
Nvidia via Independent
High Performance Computing for Scientists and Engineers
Indian Institute of Technology, Kharagpur via Swayam
CUDA programming Masterclass with C++
Udemy
Neural Network Programming - Deep Learning with PyTorch
YouTube