Mastering Chaos - Achieving Fault Tolerance with Observability-Driven Prioritized Load Shedding
Offered By: USENIX via YouTube
Course Description
Overview
Explore the challenges of microservices-based applications and learn about Aperture, an open-source tool for observability-driven prioritized load shedding, in this 40-minute conference talk from SREcon23 Asia/Pacific. Dive into the complexities of metastable failures like cascading failures and retry storms, and discover how Aperture addresses the shortcomings of current approaches. Examine Aperture's innovative architecture, including its control and data planes, and understand how it utilizes token buckets, weighted fair queuing, and concurrency limiting to effectively prioritize workloads. Gain insights from real-world implementations of Aperture in cloud products, showcasing its ability to protect multi-tenant databases from overloads through prioritized load shedding of GRPC and GraphQL traffic. Uncover a powerful solution for ensuring the reliability and resilience of microservices-based applications in this informative presentation by Harjot Gill and Hardik Shingala from FluxNinja, Inc.
Syllabus
SREcon23 Asia/Pacific - Mastering Chaos: Achieving Fault Tolerance with Observability-Driven...
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube