YoVDO

Perseus - A Fail-Slow Detection Framework for Cloud Storage Systems

Offered By: USENIX via YouTube

Tags

FAST (File and Storage Technologies) Courses Root Cause Analysis Courses

Course Description

Overview

Explore a groundbreaking fail-slow detection framework for cloud storage systems in this award-winning conference talk from FAST '23. Dive into Perseus, a practical solution designed to address the emerging challenge of fail-slow failures in both software and hardware components. Learn how this innovative framework utilizes a light regression-based model to swiftly identify and analyze performance degradation at the drive level. Discover the impressive results from a 10-month monitoring period of 248,000 drives, revealing 304 fail-slow cases and demonstrating a 48% reduction in node-level 99.99th tail latency through isolation. Gain insights into the extensive fail-slow dataset compiled from production traces, encompassing 41,000 normal drives and 315 verified fail-slow drives. Uncover the root causes behind fail-slow drives, including poorly implemented scheduling, hardware defects, and environmental factors. This 16-minute presentation by researchers from Shanghai Jiao Tong University, Alibaba Inc., Xiamen University, and Zhejiang Normal University offers valuable knowledge for professionals and researchers in cloud storage and system performance optimization.

Syllabus

FAST '23 - Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems


Taught by

USENIX

Related Courses

Fixing Healthcare Delivery
University of Florida via Coursera
Effective Problem-Solving and Decision-Making
University of California, Irvine via Coursera
Process Improvement
University of Illinois at Urbana-Champaign via Coursera
مهارات حل المشكلات واتخاذ القرارات
Edraak
Six Sigma Part 2: Analyze, Improve, Control
Technische Universität München (Technical University of Munich) via edX