Tectonic-Shift - A Composite Storage Fabric for Large-Scale ML Training
Offered By: USENIX via YouTube
Course Description
Overview
Explore a 20-minute conference talk from USENIX ATC '23 detailing Tectonic-Shift, a composite storage fabric designed for large-scale machine learning training at Meta. Discover how this innovative system addresses the challenges of meeting intensive IO and high-capacity storage demands in industrial ML environments. Learn about the workload characterization process that informed the hardware and software design, and understand the principles behind combining Shift, a flash storage tier, with Tectonic to maximize storage power efficiency. Gain insights into novel application-aware cache policies that infer future access patterns from training dataset specifications, resulting in 1.51-3.28x more IO absorption than traditional LRU flash caches. Understand how Tectonic-Shift achieves a 29% reduction in power demand for petabyte-scale production clusters, paving the way for more scalable and efficient ML training infrastructures.
Syllabus
USENIX ATC '23 - Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training
Taught by
USENIX
Related Courses
Learn to Program: Crafting Quality CodeUniversity of Toronto via Coursera Introduction to Agile Software Development: Tools & Techniques
University of California, Berkeley via edX Software Architecture & Design
Georgia Institute of Technology via Udacity Software Design for Non-Designers
mooc.house via Independent Técnicas Avançadas para Projeto de Software
Instituto Tecnológico de Aeronáutica via Coursera