YoVDO

Getting Started with OpenACC - Part II

Offered By: Nvidia via YouTube

Tags

CUDA Courses Parallel Programming Courses Performance Tuning Courses GPU Acceleration Courses

Course Description

Overview

Dive into the second part of a comprehensive tutorial on OpenACC, presented by Jeff Larkin from Nvidia. Explore advanced concepts in parallel programming, including Jacobi Iteration implementation, compiler output analysis, and optimization techniques. Learn to offload parallel kernels, manage data transfers efficiently, and utilize data directives for improved performance. Discover how to integrate OpenACC with MPI for distributed computing, and gain valuable tips and tricks for enhancing your GPU-accelerated code. Master the use of OpenACC directives such as update and host_data, and understand the importance of the C restrict keyword in optimizing performance.

Syllabus

Intro
Example: Jacobi Iteration
Jacobi Iteration: C Code
Jacobi Iteration: OpenACC C Code
PGI Accelerator Compiler output (C)
What went wrong? • Set PGI_ACC_TIME environment variable to '1'
Offloading a Parallel Kernel
Separating Data from Computation
Excessive Data Transfers
Defining data regions
Data Clauses copy (diet) Allocates memory on GPU and copies data from host to GPU when entering region and coples data to the
Array Shaping
Jacobi Iteration: Data Directives
Execution Time (lower is better)
Further speedups
Calling MPI with OpenACC (Standard MPI)
OpenACC update Directive
OpenACC host_data Directive
Calling MPI with OpenACC (GPU-aware MPI)
C tip: the restrict keyword
Tips and Tricks (cont.)


Taught by

NVIDIA Developer

Tags

Related Courses

High Performance Computing
Georgia Institute of Technology via Udacity
Fundamentals of Accelerated Computing with CUDA C/C++
Nvidia via Independent
High Performance Computing for Scientists and Engineers
Indian Institute of Technology, Kharagpur via Swayam
CUDA programming Masterclass with C++
Udemy
Neural Network Programming - Deep Learning with PyTorch
YouTube