When One Line Took Thousands of Websites Offline - Lessons from a Kubernetes Incident
Offered By: USENIX via YouTube
Course Description
Overview
Explore a critical incident analysis in this SREcon23 Europe/Middle East/Africa conference talk where a single line change in a configuration management system led to the unavailability of thousands of websites. Delve into the infrastructure that mitigated widespread damage, extract valuable lessons on infrastructure design and operational procedures, and discover significant improvements implemented in the aftermath. Gain insights into Kubernetes infrastructure, focusing on operators, automation, manual intervention, configuration management, and backup strategies. Learn from CERN experts Francisco Borges Aurindo Barros and Jack Henschel as they dissect the intense recovery procedure and share their experiences in managing large-scale web infrastructures.
Syllabus
SREcon23 Europe/Middle East/Africa - When One Line Took Thousands of Websites Offline
Taught by
USENIX
Related Courses
Incident Detection and Response: The Big PicturePluralsight Integrated safety, health and environmental management: An introduction
The Open University via OpenLearn Threat Intel Analysis of Ukrainians Power Grid Hack
YouTube A Year in the Wild - Fighting Malware at the Corporate Level
Security BSides San Francisco via YouTube Tales from the VOID - The Scary Truth about Incident Metrics
USENIX via YouTube