Confessions of a Systems Engineer - Learning from My 20+ Years of Failure
Offered By: USENIX via YouTube
Course Description
Overview
Explore a 39-minute conference talk from SREcon20 Americas where David Argent, an Amazon systems engineer, shares invaluable lessons learned from over two decades of failures in running large-scale online services. Gain insights into best practices for designing and operating complex systems, including minimizing change impact, implementing thorough monitoring, automating mitigations, and designing for quick incident resolution. Learn about the importance of regular process exercises, enforcing procedures with technology, and carefully transitioning service responsibilities. Discover practical advice on creating degraded service modes, utilizing functional gates during releases, and aggressively managing traffic during incidents. Benefit from Argent's experience-based wisdom on producing quality tools, input sanitization, and understanding all supported scenarios to enhance your systems engineering skills and avoid costly mistakes.
Syllabus
Intro
There Are No Safe Changes
Minimize the Blast Radius on Changes
Monitor Accurately and Measure Thoroughly
Automate Mitigations
Degraded Service Modes, or An Imperfect Experience Usually Beats a Nonexistent One
Use Functional Gates Pre-, Post- and During Releases
Design to Meet SLAs and Mitigate Incidents Quickly
Regularly Exercise All Processes and Tools
Enforce Processes with Technology
Redirect or Drop Traffic Aggressively During Incidents
Production Quality Tools
Sanitize and Verify Inputs
Understand All of the Scenarios You Support
Transition Service Responsibilities Carefully
Taught by
USENIX
Related Courses
Introduction to Systems EngineeringUniversity of New South Wales via Coursera Systems Engineering: Theory & Practice
Indian Institute of Technology Kanpur via Swayam Искусство системного инжиниринга и менеджмента 2.0
Moscow Institute of Physics and Technology via Coursera MBSE: Model-Based Systems Engineering
University at Buffalo via Coursera Electrical Engineering: Sensing, Powering and Controlling
University of Birmingham via FutureLearn