Gandalf - An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure
Offered By: USENIX via YouTube
Course Description
Overview
Explore an innovative end-to-end analytics service designed for safe deployment in large-scale cloud infrastructure. Learn about Gandalf, a system developed by Microsoft Azure to enable rapid and robust impact assessment of software rollouts, preventing widespread outages caused by bad updates. Discover how Gandalf monitors and analyzes various fault signals, correlating them against ongoing rollouts using spatial and temporal algorithms. Understand the core decision logic, including an ensemble ranking algorithm and binary classifier, which determine the safety of rollouts. Gain insights into Gandalf's lambda architecture, providing both real-time and long-term deployment monitoring with automated decisions and notifications. Examine the impressive results achieved in Microsoft Azure's production environment, with high precision and recall rates for both data-plane and control-plane rollouts. This conference talk from NSDI '20 offers valuable knowledge for professionals working on large-scale cloud systems and deployment safety.
Syllabus
NSDI '20 - Gandalf: An Intelligent, End To End Analytics Service for Safe Deployment in Large Scale
Taught by
USENIX
Related Courses
Scaling Memcache at FacebookUSENIX via YouTube Multi-Person Localization via RF Body Reflections
USENIX via YouTube Opaque - An Oblivious and Encrypted Distributed Analytics Platform
USENIX via YouTube Live Video Analytics at Scale with Approximation and Delay-Tolerance
USENIX via YouTube Clipper - A Low-Latency Online Prediction Serving System
USENIX via YouTube