YoVDO

Deploying SRE Training Best Practices to Production - What We Learned

Offered By: USENIX via YouTube

Tags

SREcon Courses Organizational Culture Courses Observability Courses Service-Level Objectives Courses

Course Description

Overview

Explore a 35-minute conference talk from SREcon18 Europe that delves into the lessons learned from deploying Site Reliability Engineering (SRE) training best practices to production. Learn about Google Ireland's journey in implementing SRE training, including the timeline, importance of learning, and key insights gained. Discover strategies for building sequential learning experiences, breaking real systems safely, continuous education, and fostering a culture of observability. Gain valuable insights from survey data and open-ended comments, and understand the importance of avoiding hero culture in SRE. This USENIX presentation offers practical takeaways for organizations looking to enhance their SRE training programs and improve overall reliability practices.

Syllabus

Intro
Agenda
Introduction
How did we get here
Timeline
Why Learning Matters
What Did We Learn
Building sequential learning experiences
Breaking real things
Ride shotgun
Continuous education
New to new territory
SLO Czar
Culture
Dont be a Hero
Observability
Survey Data
Survey Results
Openended Comments
Summary
Shoutouts


Taught by

USENIX

Related Courses

Certified Kubernetes Application Developer (CKAD)
A Cloud Guru
Certified Kubernetes Application Developer (CKAD) (Legacy)
A Cloud Guru
Kubernetes and Cloud Native Associate (KCNA)
A Cloud Guru
Amazon Connect APIs Intermediate
Amazon Web Services via AWS Skill Builder
Amazon DynamoDB – Monitoramento (Português) | Amazon DynamoDB - Monitoring (Portuguese)
Amazon Web Services via AWS Skill Builder