YoVDO

Watering the Roots of Resilience - Learning from Failure with Decision Trees

Offered By: USENIX via YouTube

Tags

SREcon Courses Decision Trees Courses System Resilience Courses

Course Description

Overview

Explore how Site Reliability Engineers (SREs) can align their mental models with system reality in this 41-minute conference talk from SREcon23 Americas. Delve into the concept of adaptation in complex systems and learn the importance of resilience stress testing to expose the messy reality of software environments. Examine example chaos experiments and discover how to document and visualize mental models using decision trees, which can inform design improvements and further experiments. Gain insights into reasoning about stressors and surprises in systems, and acquire practical, open-source tools applicable to everyday SRE work. By the end, understand how decision trees empower SREs to enhance system resilience and adapt to changing conditions in complex sociotechnical environments.

Syllabus

SREcon23 Americas - Watering the Roots of Resilience: Learning from Failure with Decision Trees


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube