YoVDO

Incident Management at Netflix Velocity

Offered By: USENIX via YouTube

Tags

LISA (Large Installation System Administration) Conference Courses Storytelling Courses Incident Management Courses

Course Description

Overview

Explore incident management strategies at Netflix in this 55-minute conference talk from USENIX LISA18. Discover how the streaming giant handles outages and failures while deploying thousands of changes daily and serving hundreds of millions of streaming hours. Learn about Netflix's journey with Chaos Engineering, including their experience with Chaos Monkey and the unexpected challenges faced when introducing Latency Monkey. Understand how the centralized SRE team, CORE, adapted their approach to manage incidents at high velocity. Gain insights into Netflix's preparation for failures, the importance of specialized expertise in incident handling, and the critical role of training for service operators. Examine the emphasis on post-incident learning and the goal of making each outage unique. Delve into key takeaways that can help improve incident management practices in high-velocity environments.

Syllabus

LISA18 - Incident Management at Netflix Velocity


Taught by

USENIX

Related Courses

Emergency Management: Risk, Incidents and Leadership
Coventry University via FutureLearn
Security Operations
Coventry University via FutureLearn
Planificación y Coordinación en Logística Humanitaria
Acción contra el Hambre via Miríadax
Preparing for Google Cloud Certification: Cloud DevOps Engineer
Google Cloud via Coursera
Managing Cybersecurity
University System of Georgia via Coursera