YoVDO

Troubleshooting Tiered Tragedy - A Peek Into Failure

Offered By: GOTO Conferences via YouTube

Tags

GOTO Conferences Courses Incident Response Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a real-world failure scenario and learn the process Centro uses to uncover latent system issues in this 33-minute conference talk from GOTO Berlin 2019. Explore the importance of monitoring signals before they become catastrophic, and understand how viewing a system from multiple perspectives can provide clarity during incidents. Discover the vital role of product organizations in maintaining system uptime, and gain insights into troubleshooting tiered tragedies. Follow along as the speaker breaks down the failure model, basic architecture, deployment process, and timeline of events, ultimately revealing valuable lessons learned and potential improvements for preventing future system failures.

Syllabus

Intro
What went wrong
Subsystem
Failure Model
Basic Architecture
Deployment Process
Background Tasks
Postgres Locking
Altar Table Command
HTTP Stacking
Timeline of Events
Making Improvements
Possible Alerts
Case Closed
Human Processes
Operators
Staging
Pay Jobs
Auth Users
System Failure
Lessons Learned


Taught by

GOTO Conferences

Related Courses

Addressing Algorithmic Bias
GOTO Conferences via YouTube
Empowering Consumers - Evolution of Software in the Future
GOTO Conferences via YouTube
Why Static Typing Came Back
GOTO Conferences via YouTube
Higher Kinded Types in a Lower Kinded Language - Functional Programming in Kotlin
GOTO Conferences via YouTube
It's Not Hard to Test Smart - Delivering Customer Value Faster
GOTO Conferences via YouTube