Tectonic-Shift - A Composite Storage Fabric for Large-Scale ML Training
Offered By: USENIX via YouTube
Course Description
Overview
Explore a 20-minute conference talk from USENIX ATC '23 detailing Tectonic-Shift, a composite storage fabric designed for large-scale machine learning training at Meta. Discover how this innovative system addresses the challenges of meeting intensive IO and high-capacity storage demands in industrial ML environments. Learn about the workload characterization process that informed the hardware and software design, and understand the principles behind combining Shift, a flash storage tier, with Tectonic to maximize storage power efficiency. Gain insights into novel application-aware cache policies that infer future access patterns from training dataset specifications, resulting in 1.51-3.28x more IO absorption than traditional LRU flash caches. Understand how Tectonic-Shift achieves a 29% reduction in power demand for petabyte-scale production clusters, paving the way for more scalable and efficient ML training infrastructures.
Syllabus
USENIX ATC '23 - Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training
Taught by
USENIX
Related Courses
Amazon DynamoDB - A Scalable, Predictably Performant, and Fully Managed NoSQL Database ServiceUSENIX via YouTube Faasm - Lightweight Isolation for Efficient Stateful Serverless Computing
USENIX via YouTube AC-Key - Adaptive Caching for LSM-based Key-Value Stores
USENIX via YouTube The Future of the Past - Challenges in Archival Storage
USENIX via YouTube A Decentralized Blockchain with High Throughput and Fast Confirmation
USENIX via YouTube