Watering the Roots of Resilience - Learning from Failure with Decision Trees
Offered By: USENIX via YouTube
Course Description
Overview
Explore how Site Reliability Engineers (SREs) can align their mental models with system reality in this 41-minute conference talk from SREcon23 Americas. Delve into the concept of adaptation in complex systems and learn the importance of resilience stress testing to expose the messy reality of software environments. Examine example chaos experiments and discover how to document and visualize mental models using decision trees, which can inform design improvements and further experiments. Gain insights into reasoning about stressors and surprises in systems, and acquire practical, open-source tools applicable to everyday SRE work. By the end, understand how decision trees empower SREs to enhance system resilience and adapt to changing conditions in complex sociotechnical environments.
Syllabus
SREcon23 Americas - Watering the Roots of Resilience: Learning from Failure with Decision Trees
Taught by
USENIX
Related Courses
AWS Elemental Live - ConductorAmazon Web Services via AWS Skill Builder DevOps Foundations: Chaos Engineering
LinkedIn Learning A Love Letter to Isolation - Harnessing Isolation for System Resilience and Security
CNCF [Cloud Native Computing Foundation] via YouTube Distri: Researching Fast Linux Package Management - Arch Conf 2020
media.ccc.de via YouTube Architecting for Scale
GOTO Conferences via YouTube