YoVDO

SHADE - Enable Fundamental Cacheability for Distributed Deep Learning Training

Offered By: USENIX via YouTube

Tags

FAST (File and Storage Technologies) Courses Distributed Deep Learning Courses

Course Description

Overview

Explore a groundbreaking approach to optimizing distributed deep learning training (DLT) in this conference talk from FAST '23. Dive into SHADE, a novel DLT-aware caching system that addresses the I/O performance bottleneck in accelerator-driven environments. Learn how SHADE leverages importance sampling to detect fine-grained variations at the per-sample level, making informed caching decisions for distributed DLT jobs. Discover the innovative rank-based approach that captures relative importance across different minibatches and dynamically updates importance scores during training. Examine the significant improvements in cache hit ratio and overall training performance achieved by SHADE, particularly in computer vision models. Gain insights into the challenges posed by exponentially growing dataset sizes and the unique I/O workload behaviors of DLT applications, and understand how SHADE's techniques can revolutionize storage system design for deep learning.

Syllabus

FAST '23 - SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training


Taught by

USENIX

Related Courses

Understanding the Robustness of SSDs under Power Fault
USENIX via YouTube
BetrFS - A Right-Optimized Write-Optimized File System
USENIX via YouTube
F2FS - A New File System for Flash Storage
USENIX via YouTube
DNA Data Storage and Near-Molecule Processing for the Yottabyte Era
USENIX via YouTube
FAST '21 Work-in-Progress Reports
USENIX via YouTube