YoVDO

NetBouncer - Active Device and Link Failure Localization in Data Center Networks

Offered By: USENIX via YouTube

Tags

USENIX Symposium on Networked Systems Design and Implementation (NSDI) Courses Network Engineering Courses

Course Description

Overview

Explore a comprehensive presentation on NetBouncer, an active failure localization system for data center networks. Learn how this innovative solution leverages IP-in-IP techniques to detect both device and link failures, ensuring high availability of data center services. Discover the challenges of accurately localizing failures among millions of servers and network devices, and understand how NetBouncer's algorithm integrates troubleshooting domain knowledge with machine learning to overcome real-world data inconsistencies. Gain insights into the system's deployment in Microsoft Azure's data centers, its performance in detecting spine router gray failures, and its negligible overheads on the server side. Delve into the intricacies of active probing, path selection, device failure detection, and link failure inference as you examine this robust framework for maintaining data center network reliability.

Syllabus

Intro
This is a true story
Active probing system requires explicit and efficient probing
Observation vs. inference from path probing to failures
Real-world constraints complicate path selection
Device failure detection
Link failure inference: an optimization problem
Real world data inconsistency induces false positives
Evaluation questions
Real cases spine router gray failure
Accuracy comparison with previous algorithms
NetBouncer algorithm performance
NetBouncer has negligible averheads on the server side


Taught by

USENIX

Related Courses

Scaling Memcache at Facebook
USENIX via YouTube
Multi-Person Localization via RF Body Reflections
USENIX via YouTube
Opaque - An Oblivious and Encrypted Distributed Analytics Platform
USENIX via YouTube
Live Video Analytics at Scale with Approximation and Delay-Tolerance
USENIX via YouTube
Clipper - A Low-Latency Online Prediction Serving System
USENIX via YouTube