Observing from Incidents
Offered By: USENIX via YouTube
Course Description
Overview
Explore techniques for improving system observability and incident response in this 43-minute conference talk from SREcon20 Americas. Learn how to leverage insights from numerous companies' successes and failures to enhance your organization's ability to detect and respond to incidents. Discover strategies for spreading hard-earned knowledge through effective observability practices and visualizations. Gain practical advice on how to productize the incident response process internally, ultimately reducing incident impact, enhancing customer experience, and alleviating stress on your team. Delve into methods for demystifying complex systems, moving beyond traditional alerts and dashboards to create a more robust and proactive approach to system reliability.
Syllabus
SREcon20 Americas - Observing from Incidents
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube