Ten Persistent SRE Antipatterns - Pitfalls on the Road to a Successful SRE Program
Offered By: USENIX via YouTube
Course Description
Overview
Explore ten persistent antipatterns in Site Reliability Engineering (SRE) through this 54-minute conference talk from SREcon17 Americas. Discover common pitfalls organizations face when implementing SRE practices, including misconceptions about monitoring, incident response, configuration management, and automation. Learn how Google and Netflix approach the SRE role and why it differs from traditional systems administration. Gain insights into the importance of freedom, responsibility, trust, and controlled chaos in successful SRE programs. Understand how to avoid negative impacts on operations and empower teams to accomplish their mission effectively.
Syllabus
Intro
Launch Status Check
Service Outages
Host Alerts
What Makes a Good Alert
Noise Floor
SRE Burnout
War Rooms
Sharing
SpaceX
Reliability Theater
Incident Response
Monitoring
Virtualized Servers as Cattle
Containers vs Cattle
Configuration Management
Immutable Infrastructure
Configuration Management doesnt scale
Automation doesnt scale
Centralized tools
Automation
Design Systems
Automating
Burnout Team
Feature Releases
Embedded SME
Production Ready Checklist
Periodic Revisiting
Integrations
Uptime
Risk vs Reward
Dad Jokes
The Linkage
Chaos Monkey
Complex Systems
Real Stories
Interview
Perception
Taught by
USENIX
Related Courses
Introduction to FinanceUniversity of Michigan via Coursera Information Security and Risk Management in Context
University of Washington via Coursera Financial Engineering and Risk Management
Columbia University via Coursera Building an Information Risk Management Toolkit
University of Washington via Coursera Caries Management by Risk Assessment (CAMBRA)
University of California, San Francisco via Coursera