Techniques for SLOs and Error Budgets at Scale
Offered By: Conf42 via YouTube
Course Description
Overview
Explore techniques for implementing Service Level Objectives (SLOs) and Error Budgets at scale in this conference talk from Conf42 Observability 2023. Dive into the challenges of quantifying latency for large-scale systems, learn about the differences between measuring availability and latency, and discover key strategies for democratizing SLOs and error budgets across engineering teams. Gain insights on using raw histograms for accurate latency measurements, decomposing histogram modes, and implementing multi-service SLOs and error budgets. Perfect for engineers and managers looking to improve their observability practices and maintain high-quality service levels in complex, large-scale environments.
Syllabus
intro
preface
have you used this in your career? traffic for total.rrd
hi, i'm fred
how do you implement slos for 1000 engineers?
books
sli: good vs bad requests
slo: good/bad time_range
eb: 1-slo, 1-0.9995 = 0.05%
keys to slo / error budget democratization
latency and availability
measuring availability is easy, measuring latency is not easy
quantifying latency at scale
a common mistake
"dr. histogram - how i learned to stop worrying and love latency bands"
use raw histograms, avoid sketches & approximations
decomposing histogram modes
multi service slos / error budgets
thank you, questions?
Taught by
Conf42
Related Courses
Developing a Google SRE CultureGoogle Cloud via Coursera Site Reliability Engineering: Measuring and Managing Reliability
Pluralsight Site Reliability Engineering: Measuring and Managing Reliability
Pluralsight Developing a Google SRE Culture en Français
Google Cloud via Coursera Identifying and Resolving Application Latency for Site Reliability Engineers
Google Cloud via Coursera