Building Service Ownership Using Documentation, Telemetry, and a Chance to Make Things Better
Offered By: USENIX via YouTube
Course Description
Overview
Explore a comprehensive conference talk on building distributed service ownership in software development teams. Learn strategies for transitioning from monolithic teams to distributed ownership models, including effective documentation practices, oncall responsibility distribution, and clear objective setting. Discover the balance between human-driven processes and automated telemetry-based systems. Understand the critical role of team agency in successful service ownership, and gain insights into implementing distributed tracing, centralized documentation, dynamic alert delivery, and SLO determination. Examine the importance of postmortems as documentation and the three-piece puzzle of ownership, accountability, and agency in creating scalable people systems for operating software at scale.
Syllabus
Intro
Service ownership, defined
Obstacles to successful service ownership
Distributed tracing, defined
Relationships matter
Traces = raw material, not finished product
Centralized documentation
Why is documentation important?
Iterating toward ownership
More context -- mitigating facter
Dynamic alert delivery
Handling alerts
Improving postmortems
Postmortems are documentation
Why is improving oncall important?
Determining SLOS
Derive internal SLOs using tracing
Why are SLOs important?
3-piece puzzle review
Making changes
Ownership = Accountability + Agency
Taught by
USENIX
Related Courses
Reliable Cloud Infrastructure: Design and Process en EspañolGoogle Cloud via Coursera Reliable Cloud Infrastructure: Design and Process en Français
Google Cloud via Coursera Reliable Cloud Infrastructure: Design and Process 日本語版
Google Cloud via Coursera Google Cloud Customer Care Fundamentals - 日本語版
Google Cloud via Coursera Google Cloud Customer Care Fundamentals-Português Brasileiro
Google Cloud via Coursera