Perseus - A Fail-Slow Detection Framework for Cloud Storage Systems
Offered By: USENIX via YouTube
Course Description
Overview
Explore a groundbreaking fail-slow detection framework for cloud storage systems in this award-winning conference talk from FAST '23. Dive into Perseus, a practical solution designed to address the emerging challenge of fail-slow failures in both software and hardware components. Learn how this innovative framework utilizes a light regression-based model to swiftly identify and analyze performance degradation at the drive level. Discover the impressive results from a 10-month monitoring period of 248,000 drives, revealing 304 fail-slow cases and demonstrating a 48% reduction in node-level 99.99th tail latency through isolation. Gain insights into the extensive fail-slow dataset compiled from production traces, encompassing 41,000 normal drives and 315 verified fail-slow drives. Uncover the root causes behind fail-slow drives, including poorly implemented scheduling, hardware defects, and environmental factors. This 16-minute presentation by researchers from Shanghai Jiao Tong University, Alibaba Inc., Xiamen University, and Zhejiang Normal University offers valuable knowledge for professionals and researchers in cloud storage and system performance optimization.
Syllabus
FAST '23 - Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems
Taught by
USENIX
Related Courses
Fixing Healthcare DeliveryUniversity of Florida via Coursera Effective Problem-Solving and Decision-Making
University of California, Irvine via Coursera Process Improvement
University of Illinois at Urbana-Champaign via Coursera مهارات حل المشكلات واتخاذ القرارات
Edraak Six Sigma Part 2: Analyze, Improve, Control
Technische Universität München (Technical University of Munich) via edX