The DiRT on Chaos Engineering at Google
Offered By: GOTO Conferences via YouTube
Course Description
Overview
Dive into the world of Chaos Engineering at Google with this insightful conference talk from GOTO 2021. Explore 15 years of disaster resiliency testing (DiRT) as Jason Cahoon, a Site Reliability Engineer at Google, shares valuable lessons learned from thousands of production system tests. Gain a comprehensive understanding of why and what Google tests, including various testing themes and the balance between practical and theoretical approaches. Learn how to bootstrap a disaster testing program, address concerns about breaking production systems, and effectively report results. Discover key insights from Google's experience and examine specific test examples, such as running at service level, toggling discriminators, operating without dependencies, and simulating hacks. Whether you're new to Chaos Engineering or looking to enhance your existing practices, this talk provides essential knowledge for building more resilient systems.
Syllabus
Intro
DiRT: Disaster Resiliency Testing
Why?
What we test?
Testing themes
Practical vs theoretical
How?
Picking what to test
Steps for bootstrapping a disaster testing program
Testing production vs testin in production
Really, you're breaking production though?!
Reporting on results
What have we learned?
Test example: Run at service level
Test example: Toggle the O-N / O-F-F discriminator
Test example: Run without dependencies
Test example: Hacked!
Taught by
GOTO Conferences
Related Courses
DevOps Foundations: Chaos EngineeringLinkedIn Learning Practical Chaos Engineering - Breaking Things on Purpose to Make Them More Resilient Against Failure
NDC Conferences via YouTube Patterns for Resilient Architecture
NDC Conferences via YouTube Antics, Drift, and Chaos
Strange Loop Conference via YouTube Challenges of Starting an SRE Team from Scratch in an Enterprise
USENIX via YouTube