A Post Incident Review Review
Offered By: USENIX via YouTube
Course Description
Overview
Explore a unique approach to post-incident reviews in this 44-minute conference talk from SREcon22 APAC. Discover how ANZx, a large organization in a highly regulated industry, has developed an unconventional process that eschews traditional elements like root cause analysis, action item tracking, and incident count reporting. Delve into the safety science concepts underpinning this methodology, including Rasmussen's Safety Model, Dekker's Tunnel, and James Reason's 'Swiss Cheese Model' of Accident Causation. Learn about the importance of narrative, debriefing, and timeline analysis in understanding incidents. Gain insights into effective brainstorming techniques, recommendation development, and the role of metrics and Service Level Objectives in improving system reliability. Examine the benefits of this approach in reducing repeat incidents and fostering a culture of continuous improvement.
Syllabus
Intro
Hands on Guides
PIR Styles
Rasmussen's Safety Mod
Dekker's Tunnel
Causal Map
James Reason's 'Swiss Cheese Model' of Accident Causation
Record vs Report
Narrative
Debrief
Walking the Timeline
Lessons
Brainstorming
Recommendations
Metrics
Service Level Objectives
Improvements
What's next?
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube