Everything, Everywhere, All At Once - Datadog's Global Outage and Recovery
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore a critical keynote presentation detailing Datadog's massive global outage on March 8, 2023. Delve into the trigger of the incident and the extensive efforts required for recovery. Gain insights into the technical intricacies, including the loss of over 60% of Kubernetes nodes within an hour and the challenges faced in recovering tens of thousands of impacted nodes across hundreds of clusters. Learn valuable technical and community lessons derived from this challenging experience, as shared by Datadog's Senior Software Engineer Hemanth Malla and Principal Engineer Laurent Bernaille. Understand the complexities of large-scale cloud infrastructure management and disaster recovery in this 16-minute talk from the Cloud Native Computing Foundation (CNCF).
Syllabus
Keynote: Everything, Everywhere, All At Once - Hemanth Malla & Laurent Bernaille
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Information Security Management in a NutshellSAP Learning Identifying, Monitoring, and Analyzing Risk and Incident Response and Recovery
(ISC)² via Coursera Enterprise Security Fundamentals
Microsoft via edX Planning a Security Incident Response
Microsoft via edX Introduction to Cybersecurity
Udacity