YoVDO

Ten Persistent SRE Antipatterns - Pitfalls on the Road to a Successful SRE Program

Offered By: USENIX via YouTube

Tags

SREcon Courses Risk Management Courses Virtualization Courses Incident Response Courses Team Management Courses Configuration Management Courses

Course Description

Overview

Explore ten persistent antipatterns in Site Reliability Engineering (SRE) through this 54-minute conference talk from SREcon17 Americas. Discover common pitfalls organizations face when implementing SRE practices, including misconceptions about monitoring, incident response, configuration management, and automation. Learn how Google and Netflix approach the SRE role and why it differs from traditional systems administration. Gain insights into the importance of freedom, responsibility, trust, and controlled chaos in successful SRE programs. Understand how to avoid negative impacts on operations and empower teams to accomplish their mission effectively.

Syllabus

Intro
Launch Status Check
Service Outages
Host Alerts
What Makes a Good Alert
Noise Floor
SRE Burnout
War Rooms
Sharing
SpaceX
Reliability Theater
Incident Response
Monitoring
Virtualized Servers as Cattle
Containers vs Cattle
Configuration Management
Immutable Infrastructure
Configuration Management doesnt scale
Automation doesnt scale
Centralized tools
Automation
Design Systems
Automating
Burnout Team
Feature Releases
Embedded SME
Production Ready Checklist
Periodic Revisiting
Integrations
Uptime
Risk vs Reward
Dad Jokes
The Linkage
Chaos Monkey
Complex Systems
Real Stories
Interview
Perception


Taught by

USENIX

Related Courses

Introduction to Finance
University of Michigan via Coursera
Information Security and Risk Management in Context
University of Washington via Coursera
Financial Engineering and Risk Management
Columbia University via Coursera
Building an Information Risk Management Toolkit
University of Washington via Coursera
Caries Management by Risk Assessment (CAMBRA)
University of California, San Francisco via Coursera