YoVDO

Confessions of a Systems Engineer - Learning from My 20+ Years of Failure

Offered By: USENIX via YouTube

Tags

SREcon Courses Change Management Courses Process Improvement Courses Systems Engineering Courses

Course Description

Overview

Explore a 39-minute conference talk from SREcon20 Americas where David Argent, an Amazon systems engineer, shares invaluable lessons learned from over two decades of failures in running large-scale online services. Gain insights into best practices for designing and operating complex systems, including minimizing change impact, implementing thorough monitoring, automating mitigations, and designing for quick incident resolution. Learn about the importance of regular process exercises, enforcing procedures with technology, and carefully transitioning service responsibilities. Discover practical advice on creating degraded service modes, utilizing functional gates during releases, and aggressively managing traffic during incidents. Benefit from Argent's experience-based wisdom on producing quality tools, input sanitization, and understanding all supported scenarios to enhance your systems engineering skills and avoid costly mistakes.

Syllabus

Intro
There Are No Safe Changes
Minimize the Blast Radius on Changes
Monitor Accurately and Measure Thoroughly
Automate Mitigations
Degraded Service Modes, or An Imperfect Experience Usually Beats a Nonexistent One
Use Functional Gates Pre-, Post- and During Releases
Design to Meet SLAs and Mitigate Incidents Quickly
Regularly Exercise All Processes and Tools
Enforce Processes with Technology
Redirect or Drop Traffic Aggressively During Incidents
Production Quality Tools
Sanitize and Verify Inputs
Understand All of the Scenarios You Support
Transition Service Responsibilities Carefully


Taught by

USENIX

Related Courses

Introduction to Systems Engineering
University of New South Wales via Coursera
Systems Engineering: Theory & Practice
Indian Institute of Technology Kanpur via Swayam
Искусство системного инжиниринга и менеджмента 2.0
Moscow Institute of Physics and Technology via Coursera
MBSE: Model-Based Systems Engineering
University at Buffalo via Coursera
Electrical Engineering: Sensing, Powering and Controlling
University of Birmingham via FutureLearn