YoVDO

Cultivating Production Excellence

Offered By: NDC Conferences via YouTube

Tags

NDC Conferences Courses Teamwork Courses Risk Analysis Courses Data-Driven Decision Making Courses Observability Courses Service-Level Objectives Courses

Course Description

Overview

Explore a comprehensive conference talk on cultivating production excellence in complex distributed systems. Learn about essential practices for improving production environments, including fostering stakeholder involvement, enhancing observability through collaboration, implementing Service Level Objectives for measurement, and utilizing risk analysis for prioritizing improvements. Discover strategies to evolve your approach to managing increasingly complex systems, address common challenges like noisy alerts and meaningless dashboards, and shift focus towards investing in people, culture, and processes. Gain insights on setting effective Service Level Indicators, debugging novel cases in production, promoting collaborative debugging, addressing hero culture, and quantifying risks for better planning. Understand how these practices can lead to more humane and efficient system management, even allowing for confident Friday deployments.

Syllabus

Intro
Production is increasingly complex.
We're adding complexity all the time.
Our strategies need to evolve.
When we order the alphabet soup...
Noisy alerts. Grumpy engineers.
Walls of meaningless dashboards.
Tools aren't magical.
Invest in people, culture, & process.
Eliminate (unnecessary) complexity.
Our systems are always failing.
We need Service Level Indicators
What threshold buckets events?
HTTP Code 200? Latency 100ms?
Set a target Service Level Objective.
Use a window and target percentage.
Data-driven business decisions.
Failure modes can't be predicted.
Support debugging novel cases. In production.
Allow forming & testing hypotheses.
Can you explain the variance?
Observability isn't just the data.
Debugging is not a solo activity.
Debugging is for everyone.
Collaboration is interpersonal.
Lean on your team.
Fix hero culture. Share knowledge.
Use the same platforms & tools.
Reward curiosity and teamwork.
Risk analysis helps us plan.
Quantify risks by frequency & impact.
And prioritize completing the work.
Don't waste time chrome polishing.
Lack of observability is systemic risk.
So is lack of collaboration.
A dozen engineers build Honeycomb.
We make systems humane to run
Yes, we deploy on Fridays.


Taught by

NDC Conferences

Related Courses

الخطوات الأولى في فيسمي
Coursera Project Network via Coursera
Analysis of Business Problems
IESE Business School via Coursera
Analyze User Experience (UX) Survey Data in Miro
Coursera Project Network via Coursera
Project Management: The Basics for Success
University of California, Irvine via Coursera
毕业项目
University of Science and Technology of China via Coursera