MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference
Offered By: USENIX via YouTube
Course Description
Overview
Explore a groundbreaking approach to neural network inference optimization in this 22-minute conference talk from OSDI '24. Delve into MonoNN, a novel machine learning optimizing compiler that introduces a monolithic design for static neural network inference tasks on modern GPU architectures. Learn how this innovative system accommodates entire neural networks into a single GPU kernel, significantly reducing non-computation overhead and unlocking new optimization opportunities. Discover the key challenges addressed by MonoNN, including resource incompatibility between neural network operators and the exploration of parallelism compensation strategies. Gain insights into the schedule-independent group tuning technique that efficiently manages the vast optimization space. Examine the impressive performance gains achieved by MonoNN, with average speedups of 2.01× over state-of-the-art frameworks and compilers, and specific improvements of up to 7.3× compared to leading solutions like TVM, TensorRT, XLA, and AStitch. Access the open-source implementation to further explore this cutting-edge advancement in GPU-centric neural network optimization.
Syllabus
OSDI '24 - MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference...
Taught by
USENIX
Related Courses
Optimize TensorFlow Models For Deployment with TensorRTCoursera Project Network via Coursera Jetson Xavier NX Developer Kit - Edge AI Supercomputer Features and Applications
Nvidia via YouTube NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale
Nvidia via YouTube Jetson AGX Xavier: Architecture and Applications for Autonomous Machines
Nvidia via YouTube Streamline Deep Learning for Video Analytics with DeepStream SDK 2.0
Nvidia via YouTube