YoVDO

Are We All on the Same Page? Let's Fix That

Offered By: USENIX via YouTube

Tags

SREcon Courses Distributed Systems Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a conference talk from SREcon23 Asia/Pacific that addresses the challenge of implementing effective alerting strategies in complex distributed systems. Learn about Adaptive Paging, an innovative alert handler that utilizes causality from tracing and OpenTelemetry's semantic conventions to identify and notify the team closest to the problem. Discover how this approach enables organizations to maintain a symptom-based alerting strategy without overwhelming teams or causing alert fatigue. Gain insights into setting thresholds derived from operation service level objectives and understand how this method can improve incident response in large-scale, multi-team environments.

Syllabus

SREcon23 Asia/Pacific - Are We All on the Same Page? Let's Fix That


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube