YoVDO

CUDA Crash Course

Offered By: YouTube

Tags

CUDA Courses Algorithms Courses Convolution Courses Parallel Computing Courses GPU Programming Courses Matrix Multiplication Courses

Course Description

Overview

Dive into a comprehensive 7-hour crash course on CUDA programming, covering essential topics from basic vector addition to advanced GPU performance optimizations. Learn to implement and optimize various algorithms including matrix multiplication, sum reduction, and convolution using CUDA. Explore unified memory, cache tiling, coalescing, and the use of libraries like cuBLAS. Gain practical experience with hands-on exercises in both Windows and Linux environments, and understand crucial concepts such as spatial thinking and handling non-perfect input sizes. Master profiling techniques and discover how to maximize GPU performance through a series of in-depth lessons and real-world examples.

Syllabus

CUDA Crash Course: Vector Addition.
CUDA Crash Course: Unified Memory Vector Add.
CUDA Crash Course: Matrix Multiplication.
CUDA Crash Course: Cache Tiled Matrix Multiplication.
CUDA Crash Course: Why Coalescing Matters.
CUDA Crash Course: cuBLAS Vector Add.
CUDA Crash Course: cuBLAS Matrix Multiplication.
CUDA Crash Course: Sum Reduction Part 1.
CUDA Crash Course: Sum Reduction Part 2.
CUDA Crash Course: Sum Reduction Part 3.
CUDA Crash Course: Sum Reduction Part 4.
CUDA Crash Course: Sum Reduction Part 5.
CUDA Crash Course: Visual Studio 2017 Environment Setup.
CUDA Crash Course: Programming in Linux.
CUDA Crash Course: Video Corrections.
CUDA Crash Course: Sum Reduction Part 6.
CUDA Crash Course: Naive 1-D Convolution.
CUDA Crash Course: 1-D Convolution with Constant Memory.
CUDA Crash Course: Tiled 1-D Convolution.
CUDA Crash Course: 1-D Convolution Cache Simplification.
CUDA Crash Course: 2-D Convolution.
CUDA Crash Course: Thinking Spatially.
CUDA Crash Course: Optimizing Histogram Kernels.
CUDA Crash Course: Comparing Matrix Multiplication Implementations.
CUDA Crash Course: Comparing Sum Reduction Implementations.
CUDA Crash Course: Handling Non-Perfect Input Sizes.
CUDA Crash Course: OpenACC Matrix Multiplication.
CUDA Crash Course: Device Properties.
CUDA Crash Course: Profiling with clock().
CUDA Crash Course: GPU Performance Optimizations Part 1.


Taught by

CoffeeBeforeArch

Related Courses

CUDA Advanced Libraries
Johns Hopkins University via Coursera
CUDA at Scale for the Enterprise
Johns Hopkins University via Coursera
Parallel Computing with CUDA
Pluralsight
Learn to Write Unity Compute Shaders
Udemy
CUDA programming Masterclass with C++
Udemy