YoVDO

Canarying Well - Lessons Learned from Canarying Large Populations

Offered By: USENIX via YouTube

Tags

SREcon Courses Software Development Courses

Course Description

Overview

Explore the intricacies of canarying in production environments through this insightful conference talk from SREcon18 Europe. Delve into common pitfalls, best practices, and a comprehensive end-to-end strategy for implementing effective canary processes. Learn from Google's Štěpán Davidovič as he shares valuable lessons on controlled rollouts to mitigate risks in large-scale systems. Gain a deeper understanding of canarying priorities, geographical distribution challenges, high variance scenarios, and bimodal distributions. Examine real-world examples involving service caches, memory leaks, and compound probabilities. Discover the importance of careful metric selection and analysis in ensuring successful canary deployments. Walk away with practical knowledge on implementing a robust three-step canary process to enhance the safety and reliability of your production changes.

Syllabus

Intro
Canarying: What is that?
What we're going to talk about
What we're not going to talk about
Conflicting Incentives
Triangle of Canarying Priorities
Example: Geographical distribution
Example: High variance among replicas
Example: Bimodal distribution
Example: Two metrics, different outliers
Takeaways 2
Example: Service With Cache, restarted
Example: Memory leak canary
Example: Before/after test
Example Takeaway
Example: Compound probability
Beware Meta Analysis
Prefer Few Metrics
Canary In These 3 Simple Steps
Canary In These 3-ish Simple Steps


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube