YoVDO

Automatic Generation of Runtime Checkers for Production Distributed Systems

Offered By: Strange Loop Conference via YouTube

Tags

Strange Loop Conference Courses Distributed Systems Courses Performance Improvement Courses

Course Description

Overview

Explore three systematic techniques for automatically generating effective, customized runtime checkers for large distributed systems in this 40-minute Strange Loop Conference talk. Learn about Panorama's approach to capturing in-situ observability, a program reduction method for identifying long-running regions and inserting watchdog hooks, and Oathkeeper's strategy for detecting silent semantic violations. Discover how these techniques can help detect and localize unexpected subtle failures in complex production environments, improving the reliability and availability of modern distributed systems. Gain insights from real-world failure studies and performance evaluations presented by Ryan Huang, an Assistant Professor at Johns Hopkins University specializing in computer systems research.

Syllabus

Intro
Runtime checker (aka. detector/monitor)
Importance of runtime checker
Current checking practice
Complex internals of modern software
Common to exhibit gray failures
A real-world gray failure
Failure root cause
Ideal runtime checkers
A new approach
Panorama: capture in-situ observability
Convert a program into in-situ observer
Identify observation boundary and identities
Extract evidence
Example of analysis
Detecting real-world gray failures
Timeline of detecting failure case f1
Latency overhead to observers
Program reduction approach
Why doing reduction?
identify long-running regions
select checking target candidates
reduce long-running methods
encapsulate checkers
insert watchdog hooks
Prevent side effects
Watchdog generation
Failure detection evaluation setup
Detecting real-world failures
Silent semantic violations
Real-world failure study
Oathkeeper: detect silent semantic violation
How to express semantics?
Oathkeeper workflow
Emitting semantic event traces
General semantic rule templates
Extracted semantic rules
Runtime overhead
Conclusions


Taught by

Strange Loop Conference

Tags

Related Courses

Advanced Operating Systems
Georgia Institute of Technology via Udacity
High Performance Computing
Georgia Institute of Technology via Udacity
GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity
Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX
CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX