Improvements to NVIDIA CUDA and Deep Learning Libraries - Session 1
Offered By: Nvidia via YouTube
Course Description
Overview
Syllabus
Intro
CUDA DEVELOPMENT ECOSYSTEM
POWERING THE DEEP LEARNING ECOSYSTEM
TESLA UNIVERSAL ACCELERATION PLATFORM
ACCELERATED COMPUTING IS FULL-STACK OPTIMIZATION
INTRODUCING CUDA 10,0
16 GPUS WITH 32GB MEMORY EACH
NVSWITCH: ALL-TO-ALL CONNECTIVITY
UNIFIED MEMORY + DGX-2
2X HIGHER PERFORMANCE WITH NVSWITCH
NEW PROGRAMMING MODEL FEATURES
ASYNCHRONOUS TASK GRAPHS
NEW EXECUTION MECHANISM
EXECUTION OPTIMIZATIONS
PERFORMANCE IMPACT
THE PATH TO FUSION ENERGY
VOLTA TENSOR CORE
NEW TURING TENSOR CORE
NEW TURING WARP MATRIX FUNCTIONS
CUTLASS 1.1
NVIDIA NGX: DL FOR CREATIVE APPLICATIONS
IN ADOBE PHOTOSHOP
CUDNN: GPU ACCELERATED DEEP LEARNING
IMPROVED HEURISTICS FOR CONVOLUTIONS
PERSISTENT RNN SPEEDUP ON V100
STRIDED ACTIVATION GRADIENTS
TENSORCORES WITH FP32 MODELS
MORE TENSORCORE PERFORMANCE IMPROVEMENTS
GENERAL PERFORMANCE IMPROVEMENTS
FUTURE UPDATES
Taught by
NVIDIA Developer
Tags
Related Courses
Deep Learning - Computer Vision for Beginners Using PyTorchPackt via Coursera CUDA Advanced Libraries
Johns Hopkins University via Coursera CUDA at Scale for the Enterprise
Johns Hopkins University via Coursera CUDA Programming - High-Performance Computing with GPUs
freeCodeCamp GPU Programming
Johns Hopkins University via Coursera