Lessons in Building Resilient Systems - Insights from Amazon and Meta
Offered By: Conf42 via YouTube
Course Description
Overview
Explore valuable insights on building resilient systems through real-life scenarios and practical strategies in this 20-minute conference talk from Conf42 IM 2024. Dive into various challenges faced by large-scale systems, including traffic floods, retry storms, and problematic commits. Learn essential prevention techniques such as defensive coding practices, effective logging, and error handling. Discover how to set up and mitigate alerts, prepare for high-velocity events, and conduct thorough self-reviews. Gain actionable takeaways to enhance system resilience based on experiences from industry giants Amazon and Meta.
Syllabus
Introduction: What Can Possibly Go Wrong?
Real-Life Scenario: Flood of Traffic
Real-Life Scenario: Retry Storm
Real-Life Scenario: Plan B Went Poorly
Real-Life Scenario: Bad Commit
Real-Life Scenario: Lack of Sufficient Ownership
Real-Life Scenario: Script Errors
Prevention Strategies: Defensive Coding Practices
Logging and Error Handling Best Practices
Setting Effective Alerts
Mitigation Strategies for Alerts
Preparing for High Velocity Events
Conducting a Self Review
Conclusion and Takeaways
Taught by
Conf42
Related Courses
Learn to Program: Crafting Quality CodeUniversity of Toronto via Coursera 数据结构与算法 Data Structures and Algorithms
Peking University via Coursera 数据结构与算法第一部分 | Data Structures and Algorithms Part 1
Peking University via edX Software Construction in Java
Massachusetts Institute of Technology via edX Advanced Software Construction in Java
Massachusetts Institute of Technology via edX