YoVDO

Observing from Incidents

Offered By: USENIX via YouTube

Tags

SREcon Courses Incident Management Courses

Course Description

Overview

Explore techniques for improving system observability and incident response in this 43-minute conference talk from SREcon20 Americas. Learn how to leverage insights from numerous companies' successes and failures to enhance your organization's ability to detect and respond to incidents. Discover strategies for spreading hard-earned knowledge through effective observability practices and visualizations. Gain practical advice on how to productize the incident response process internally, ultimately reducing incident impact, enhancing customer experience, and alleviating stress on your team. Delve into methods for demystifying complex systems, moving beyond traditional alerts and dashboards to create a more robust and proactive approach to system reliability.

Syllabus

SREcon20 Americas - Observing from Incidents


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube