Thanos Receiver Deep Dive - Stability and Incident Management
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Dive deep into the intricacies of Thanos Receiver in this informative conference talk. Explore the challenges and solutions for tuning the stability of metric receivers in Thanos, a system known for its ability to ingest metrics via remote write from multiple sources simultaneously. Learn from real-world incidents and their impact on current approaches to running metric receivers in Kubernetes. Discover strategies for achieving a stable setup that can withstand scheduled rollouts and node restarts, and gain insights into attempts at making receivers self-healing. Examine a surprising failure mode that affected multiple hard-tenants, and understand its implications for system reliability. Gain valuable knowledge for optimizing Thanos Receiver performance and stability in cloud-native environments.
Syllabus
Thanos Receiver Deep Dive - Joel Verezhak, Open Systems
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Information Security Management in a NutshellSAP Learning Identifying, Monitoring, and Analyzing Risk and Incident Response and Recovery
(ISC)² via Coursera Enterprise Security Fundamentals
Microsoft via edX Planning a Security Incident Response
Microsoft via edX Introduction to Cybersecurity
Udacity