YoVDO

Ten Things We've Learned From Running Production Infrastructure at Google

Offered By: GOTO Conferences via YouTube

Tags

GOTO Conferences Courses Change Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore key insights from Google's Site Reliability Engineering (SRE) practices in this 39-minute conference talk. Discover ten fundamental organizational principles learned from managing one of the world's most complex production infrastructures. Learn about the importance of reliability, the "cattle vs. pets" approach, blameless culture, effective measurement, failure modes, and automation. Understand why change is constant and the leading cause of outages, why outages are inevitable, and the concept of avoiding "haunted graveyards" in systems. Gain valuable knowledge on maintaining reliable, scalable, efficient, and agile production environments from Google's extensive experience in SRE.

Syllabus

Intro
Culture
1. Reliability can't be taken for granted
2. Cattle vs. Pets
3. Blamelessness
4. Measure what matters
A word on Ops
5. Failure modes
6. No heroes
7. Automation
Change is constant
8. Change is No. 1 reason for outages
9. Outages are inevitable
10. No haunted graveyards
What did we learn?
Outro


Taught by

GOTO Conferences

Related Courses

2021 Physician Leadership Virtual Journal Club: Change Management (RECORDING)
Stanford University via Independent
The 2022 Nursing Leadership Webinar Series - Part Three: What I Learned about Moving a Medical Center (and How it Impacted the Rest of My Nursing Career)
Dartmouth College via Independent
Accountability and Employee Engagement
University of Colorado Boulder via Coursera
Hands-On with AWS Systems Manager
A Cloud Guru
Adaptive Leadership
Acumen Academy