VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Offered By: Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
Course Description
Overview
Discover a groundbreaking approach to sparse tensor computation in this conference talk from the International Conference for High Performance Computing, Networking, Storage, and Analysis (#SC23). Explore the innovative V:N:M format that enables execution of arbitrary N:M ratios on NVIDIA's Sparse Tensor Cores (SPTCs), overcoming the limitations of the current 2:4 format. Delve into the high-performance sparse library Spatha, designed to efficiently exploit this new format, achieving up to 37x speedup over cuBLAS. Examine a novel second-order pruning technique that allows for high sparsity ratios in modern transformers with minimal accuracy loss. Gain insights into GPU Tensor Cores, sparse formats, sparse linear algebra, and evaluation methods as you uncover the potential of this vectorized approach to unleash the power of sparse tensor cores in deep learning applications.
Syllabus
Intro
GPU Tensor Cores
Sparse Formats
Sparse Linear Algebra
Second Order Pruning
Evaluation
Taught by
Scalable Parallel Computing Lab, SPCL @ ETH Zurich
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera 機器學習技法 (Machine Learning Techniques)
National Taiwan University via Coursera Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera Прикладные задачи анализа данных
Moscow Institute of Physics and Technology via Coursera Leading Ambitious Teaching and Learning
Microsoft via edX