YoVDO

Cultivating Production Excellence

Offered By: NDC Conferences via YouTube

Tags

NDC Conferences Courses Teamwork Courses Risk Analysis Courses Data-Driven Decision Making Courses Observability Courses Service-Level Objectives Courses

Course Description

Overview

Explore a comprehensive conference talk on cultivating production excellence in complex distributed systems. Learn about essential practices for improving production environments, including fostering stakeholder involvement, enhancing observability through collaboration, implementing Service Level Objectives for measurement, and utilizing risk analysis for prioritizing improvements. Discover strategies to evolve your approach to managing increasingly complex systems, address common challenges like noisy alerts and meaningless dashboards, and shift focus towards investing in people, culture, and processes. Gain insights on setting effective Service Level Indicators, debugging novel cases in production, promoting collaborative debugging, addressing hero culture, and quantifying risks for better planning. Understand how these practices can lead to more humane and efficient system management, even allowing for confident Friday deployments.

Syllabus

Intro
Production is increasingly complex.
We're adding complexity all the time.
Our strategies need to evolve.
When we order the alphabet soup...
Noisy alerts. Grumpy engineers.
Walls of meaningless dashboards.
Tools aren't magical.
Invest in people, culture, & process.
Eliminate (unnecessary) complexity.
Our systems are always failing.
We need Service Level Indicators
What threshold buckets events?
HTTP Code 200? Latency 100ms?
Set a target Service Level Objective.
Use a window and target percentage.
Data-driven business decisions.
Failure modes can't be predicted.
Support debugging novel cases. In production.
Allow forming & testing hypotheses.
Can you explain the variance?
Observability isn't just the data.
Debugging is not a solo activity.
Debugging is for everyone.
Collaboration is interpersonal.
Lean on your team.
Fix hero culture. Share knowledge.
Use the same platforms & tools.
Reward curiosity and teamwork.
Risk analysis helps us plan.
Quantify risks by frequency & impact.
And prioritize completing the work.
Don't waste time chrome polishing.
Lack of observability is systemic risk.
So is lack of collaboration.
A dozen engineers build Honeycomb.
We make systems humane to run
Yes, we deploy on Fridays.


Taught by

NDC Conferences

Related Courses

Certified Kubernetes Application Developer (CKAD)
A Cloud Guru
Certified Kubernetes Application Developer (CKAD) (Legacy)
A Cloud Guru
Kubernetes and Cloud Native Associate (KCNA)
A Cloud Guru
Amazon Connect APIs Intermediate
Amazon Web Services via AWS Skill Builder
Amazon DynamoDB – Monitoramento (Português) | Amazon DynamoDB - Monitoring (Portuguese)
Amazon Web Services via AWS Skill Builder