NetBouncer - Active Device and Link Failure Localization in Data Center Networks
Offered By: USENIX via YouTube
Course Description
Overview
Explore a comprehensive presentation on NetBouncer, an active failure localization system for data center networks. Learn how this innovative solution leverages IP-in-IP techniques to detect both device and link failures, ensuring high availability of data center services. Discover the challenges of accurately localizing failures among millions of servers and network devices, and understand how NetBouncer's algorithm integrates troubleshooting domain knowledge with machine learning to overcome real-world data inconsistencies. Gain insights into the system's deployment in Microsoft Azure's data centers, its performance in detecting spine router gray failures, and its negligible overheads on the server side. Delve into the intricacies of active probing, path selection, device failure detection, and link failure inference as you examine this robust framework for maintaining data center network reliability.
Syllabus
Intro
This is a true story
Active probing system requires explicit and efficient probing
Observation vs. inference from path probing to failures
Real-world constraints complicate path selection
Device failure detection
Link failure inference: an optimization problem
Real world data inconsistency induces false positives
Evaluation questions
Real cases spine router gray failure
Accuracy comparison with previous algorithms
NetBouncer algorithm performance
NetBouncer has negligible averheads on the server side
Taught by
USENIX
Related Courses
4G Network EssentialsInstitut Mines-Télécom via edX Data Plane Programming
Karlstad University via Independent Preparing for Google Cloud Certification: Cloud Network Engineer
Google Cloud via Coursera CCNP Route 642-902 Implementing Cisco IP Routing
Udemy Linux for Network Engineers: Practical Linux with GNS3
Udemy