Building Service Ownership Using Documentation, Telemetry, and a Chance to Make Things Better
Offered By: USENIX via YouTube
Course Description
Overview
Explore a comprehensive conference talk on building distributed service ownership in software development teams. Learn strategies for transitioning from monolithic teams to distributed ownership models, including effective documentation practices, oncall responsibility distribution, and clear objective setting. Discover the balance between human-driven processes and automated telemetry-based systems. Understand the critical role of team agency in successful service ownership, and gain insights into implementing distributed tracing, centralized documentation, dynamic alert delivery, and SLO determination. Examine the importance of postmortems as documentation and the three-piece puzzle of ownership, accountability, and agency in creating scalable people systems for operating software at scale.
Syllabus
Intro
Service ownership, defined
Obstacles to successful service ownership
Distributed tracing, defined
Relationships matter
Traces = raw material, not finished product
Centralized documentation
Why is documentation important?
Iterating toward ownership
More context -- mitigating facter
Dynamic alert delivery
Handling alerts
Improving postmortems
Postmortems are documentation
Why is improving oncall important?
Determining SLOS
Derive internal SLOs using tracing
Why are SLOs important?
3-piece puzzle review
Making changes
Ownership = Accountability + Agency
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube