Getting Started with OpenACC - Part II
Offered By: Nvidia via YouTube
Course Description
Overview
Dive into the second part of a comprehensive tutorial on OpenACC, presented by Jeff Larkin from Nvidia. Explore advanced concepts in parallel programming, including Jacobi Iteration implementation, compiler output analysis, and optimization techniques. Learn to offload parallel kernels, manage data transfers efficiently, and utilize data directives for improved performance. Discover how to integrate OpenACC with MPI for distributed computing, and gain valuable tips and tricks for enhancing your GPU-accelerated code. Master the use of OpenACC directives such as update and host_data, and understand the importance of the C restrict keyword in optimizing performance.
Syllabus
Intro
Example: Jacobi Iteration
Jacobi Iteration: C Code
Jacobi Iteration: OpenACC C Code
PGI Accelerator Compiler output (C)
What went wrong? • Set PGI_ACC_TIME environment variable to '1'
Offloading a Parallel Kernel
Separating Data from Computation
Excessive Data Transfers
Defining data regions
Data Clauses copy (diet) Allocates memory on GPU and copies data from host to GPU when entering region and coples data to the
Array Shaping
Jacobi Iteration: Data Directives
Execution Time (lower is better)
Further speedups
Calling MPI with OpenACC (Standard MPI)
OpenACC update Directive
OpenACC host_data Directive
Calling MPI with OpenACC (GPU-aware MPI)
C tip: the restrict keyword
Tips and Tricks (cont.)
Taught by
NVIDIA Developer
Tags
Related Courses
High Performance ComputingGeorgia Institute of Technology via Udacity Fundamentals of Accelerated Computing with CUDA C/C++
Nvidia via Independent High Performance Computing for Scientists and Engineers
Indian Institute of Technology, Kharagpur via Swayam CUDA programming Masterclass with C++
Udemy Neural Network Programming - Deep Learning with PyTorch
YouTube