YoVDO

Spike Detection in Alert Correlation at LinkedIn

Offered By: USENIX via YouTube

Tags

SREcon Courses Microservices Courses Distributed Systems Courses Incident Response Courses Outlier Detection Courses

Course Description

Overview

Explore spike detection techniques for alert correlation in large-scale microservices architectures through this SREcon21 conference talk. Delve into LinkedIn's approach to identifying root causes during production outages amidst thousands of interconnected services. Learn about the challenges of distinguishing genuine issues from false positives in a complex alert landscape. Discover how LinkedIn implemented anomaly detection using Modified Z-Score and Median Absolute Deviation (MAD) to streamline their alert correlation system. Gain insights into practical applications, challenges faced, and results achieved in reducing false escalations and minimizing issue resolution time. Understand the nuances of correlation versus causation in the context of microservices monitoring and troubleshooting.

Syllabus

Intro
Background: Quick 1 Introduction of Linkedin Stack
Linkedin Stack Under the hood
Finding Needle in a haystack
Alert Correlation A framework that automates the alert correlation process to identity unhealthy microservices
Alert Correlation Slack Recommendations
A Real Issue
A Spike
Correlation does not mean Causation
Problem Statement: Finding the "right" needle in a needlestack
Modified Z-Score For Outlier Detection
MAD (Median Absolute Deviation)
A Simple Example
Spike Detection Challenges
Results: Spike vs Real


Taught by

USENIX

Related Courses

Information Security Management in a Nutshell
SAP Learning
Identifying, Monitoring, and Analyzing Risk and Incident Response and Recovery
(ISC)² via Coursera
Enterprise Security Fundamentals
Microsoft via edX
Planning a Security Incident Response
Microsoft via edX
Introduction to Cybersecurity
Udacity