YoVDO

SLO Burn - Reducing Alert Fatigue and Maintenance Cost in Systems of Any Size

Offered By: USENIX via YouTube

Tags

LISA (Large Installation System Administration) Conference Courses Prometheus Courses Service-Level Objectives Courses

Course Description

Overview

Explore strategies for reducing alert fatigue and maintenance costs in systems of any size through this 43-minute conference talk from LISA18. Learn how to implement Service Level Objectives (SLOs) and error budgets to create more sustainable alerting practices as systems grow and become more complex. Discover the benefits of symptom-based alerting over cause-based alerting, and see a live demonstration using Prometheus to construct robust, low-maintenance alerting rules. Gain insights into maintaining system observability without relying on noisy cause-based alerts, applicable to environments ranging from 10 machines to 10 data centers. Presented by Jamie Wilkinson from Google, this talk offers practical solutions for SRE teams facing scaling challenges and the risk of burnout due to constant firefighting.

Syllabus

LISA18 - SLO Burn—Reducing Alert Fatigue and Maintenance Cost in Systems of Any Size


Taught by

USENIX

Related Courses

Developing a Google SRE Culture
Google Cloud via Coursera
Site Reliability Engineering: Measuring and Managing Reliability
Pluralsight
Site Reliability Engineering: Measuring and Managing Reliability
Pluralsight
Developing a Google SRE Culture en Français
Google Cloud via Coursera
Identifying and Resolving Application Latency for Site Reliability Engineers
Google Cloud via Coursera