YoVDO

Everything, Everywhere, All At Once - Datadog's Global Outage and Recovery

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Incident Response Courses DevOps Courses Cloud Computing Courses Kubernetes Courses Disaster Recovery Courses Scalability Courses Infrastructure Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a critical keynote presentation detailing Datadog's massive global outage on March 8, 2023. Delve into the trigger of the incident and the extensive efforts required for recovery. Gain insights into the technical intricacies, including the loss of over 60% of Kubernetes nodes within an hour and the challenges faced in recovering tens of thousands of impacted nodes across hundreds of clusters. Learn valuable technical and community lessons derived from this challenging experience, as shared by Datadog's Senior Software Engineer Hemanth Malla and Principal Engineer Laurent Bernaille. Understand the complexities of large-scale cloud infrastructure management and disaster recovery in this 16-minute talk from the Cloud Native Computing Foundation (CNCF).

Syllabus

Keynote: Everything, Everywhere, All At Once - Hemanth Malla & Laurent Bernaille


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Financial Sustainability: The Numbers side of Social Enterprise
+Acumen via NovoEd
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent
Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX
Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms