The Hunt for the Cluster Killer Bug - Debugging Erlang Legacy Code
Offered By: Code Sync via YouTube
Course Description
Overview
Embark on a thrilling debugging journey in this 47-minute conference talk from Code BEAM Europe 2022. Explore the unexpected challenges faced by Klarna's fault-tolerant Erlang system, Kred, when a seemingly minor Kafka outage led to a catastrophic cluster failure. Delve into the intricate process of identifying, fixing, and preventing the elusive "cluster-killer bug" through a series of unexpected twists and deep dives into the Erlang technology stack. Gain valuable insights into Erlang's memory model and acquire new tools for debugging low-level issues in Erlang applications. Follow along as the speaker navigates through system architecture, troubleshooting techniques, metric analysis, and lock-up testing, ultimately unraveling the mystery behind the system's vulnerability. Perfect for developers looking to enhance their debugging skills and gain a deeper understanding of fault tolerance in complex Erlang systems.
Syllabus
00:00 - - Intro and Fault Tolerance
04:40 - - System Architecture
08:28 - - Troubleshooting
09:16 - - Identify
13:30 - - Fix
15:27 - - Alert + Identify + Fix
20:37 - - The incident
21:33 - - Symptoms
27:28 - - Validate
29:46 - - The Path of Metrics
34:11 - - Testing lock-ups
40:20 - - The Mystery Term
Taught by
Code Sync
Related Courses
Heterogeneous Parallel ProgrammingUniversity of Illinois at Urbana-Champaign via Coursera Advanced Operating Systems
Georgia Institute of Technology via Udacity 計算機程式設計 (Computer Programming)
National Taiwan University via Coursera Introduction to Operating Systems
Georgia Institute of Technology via Udacity Android Performance
Google via Udacity