YoVDO

The Secret Lives of SREs - Controlling the Costs of Coordination across Remote Teams

Offered By: USENIX via YouTube

Tags

SREcon Courses Distributed Computing Courses Incident Management Courses

Course Description

Overview

Explore the intricacies of incident response and coordination in remote SRE teams through this 48-minute conference talk from SREcon20 Americas. Delve into Dr. Laura Maguire's three-year research on engineering teams handling service outages, examining 62 cases across four organizations. Discover surprising findings that challenge existing domain models, including how incident management differs from GoogleSRE suggestions and how incident command can hinder fast resolution. Learn about the subtle choreography of cognitive work in fault management, the potential drawbacks of coordination tools, and strategies for adaptive choreography. Gain insights into how tooling and intra-organizational dependencies affect coordination costs across time and organizational boundaries, increasing complexity for SREs. Understand the challenges of coordinating multiple perspectives, dealing with backup issues, and managing hidden complexities in distributed computing environments.

Syllabus

Introduction
The Secret Lives of SREs
Coordinate Multiple Diverse Perspectives
Backup Issues
Hidden Complexity
Outlier Event
Sarah
Sarahs Knowledge
Incident Response
Incident Command
Speed Bumps
Distributed Computing
Conclusion


Taught by

USENIX

Related Courses

Cloud Computing Concepts, Part 1
University of Illinois at Urbana-Champaign via Coursera
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera
Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms