YoVDO

MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference

Offered By: USENIX via YouTube

Tags

TensorRT Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking approach to neural network inference optimization in this 22-minute conference talk from OSDI '24. Delve into MonoNN, a novel machine learning optimizing compiler that introduces a monolithic design for static neural network inference tasks on modern GPU architectures. Learn how this innovative system accommodates entire neural networks into a single GPU kernel, significantly reducing non-computation overhead and unlocking new optimization opportunities. Discover the key challenges addressed by MonoNN, including resource incompatibility between neural network operators and the exploration of parallelism compensation strategies. Gain insights into the schedule-independent group tuning technique that efficiently manages the vast optimization space. Examine the impressive performance gains achieved by MonoNN, with average speedups of 2.01× over state-of-the-art frameworks and compilers, and specific improvements of up to 7.3× compared to leading solutions like TVM, TensorRT, XLA, and AStitch. Access the open-source implementation to further explore this cutting-edge advancement in GPU-centric neural network optimization.

Syllabus

OSDI '24 - MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference...


Taught by

USENIX

Related Courses

Optimize TensorFlow Models For Deployment with TensorRT
Coursera Project Network via Coursera
Jetson Xavier NX Developer Kit - Edge AI Supercomputer Features and Applications
Nvidia via YouTube
NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale
Nvidia via YouTube
Jetson AGX Xavier: Architecture and Applications for Autonomous Machines
Nvidia via YouTube
Streamline Deep Learning for Video Analytics with DeepStream SDK 2.0
Nvidia via YouTube