YoVDO

When One Line Took Thousands of Websites Offline - Lessons from a Kubernetes Incident

Offered By: USENIX via YouTube

Tags

Incident Analysis Courses Kubernetes Courses Disaster Recovery Courses Configuration Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a critical incident analysis in this SREcon23 Europe/Middle East/Africa conference talk where a single line change in a configuration management system led to the unavailability of thousands of websites. Delve into the infrastructure that mitigated widespread damage, extract valuable lessons on infrastructure design and operational procedures, and discover significant improvements implemented in the aftermath. Gain insights into Kubernetes infrastructure, focusing on operators, automation, manual intervention, configuration management, and backup strategies. Learn from CERN experts Francisco Borges Aurindo Barros and Jack Henschel as they dissect the intense recovery procedure and share their experiences in managing large-scale web infrastructures.

Syllabus

SREcon23 Europe/Middle East/Africa - When One Line Took Thousands of Websites Offline


Taught by

USENIX

Related Courses

Introduction to Cloud Infrastructure Technologies
Linux Foundation via edX
Scalable Microservices with Kubernetes
Google via Udacity
Google Cloud Fundamentals: Core Infrastructure
Google via Coursera
Introduction to Kubernetes
Linux Foundation via edX
Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX