Beyond Goldilocks Reliability
Offered By: USENIX via YouTube
Course Description
Overview
Explore the concept of reliability engineering in this 39-minute conference talk from SREcon21. Delve into the limitations of current best practices and the "Goldilocks approach" to reliability. Examine a concrete model for framing reliability and its implications for answering complex questions about services. Investigate why certain mitigation strategies are effective and how aggregation and backend drains contribute to reliability. Learn how to identify underlying mechanisms to reinforce desired reliability properties and develop new mitigation strategies. Discover the concept of reliability modeled as stationarity and its practical applications in hierarchical diagnostics and exposing reliability phenomena. Gain insights into the future capabilities of reliability engineering and draw conclusions for improving current practices.
Syllabus
Intro
Acknowledgements
Our Reliability Approach
Goldilocks Reliability
Load Bearing Assumptions
Practical Porridge Problems
The Trouble with Thresholds
Mo' Porridge Mo' Problems
Make More Models!
Model Elephants
Reliability, modeled as Stationarity
Stationarity Works!
Hierarchical Diagnostics
Stationarity Exposes Reliability Phenomena
Tantalizing Capabilities
Conclusions
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube