How the OOM-Killer Deleted My Namespace, and Other Kubernetes Tales
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Syllabus
Intro
Datadog
Symptoms
Investigation
Deletion call, 4d before Audit logs for the namespace
Spinnaker deploys (v1)
Helm 3 deploys (v2)
Big difference
What happened?
Namespace Controller logs Virtual
Events so far
Metrics-server setup
Metrics-server deployment
Full chain of events
Key take-away Apiservice extensions are great but can impact your cluster
Context
Runtime is down?
CNI status
Containerd goroutine dump Blocked goroutines?
Seems CNI related
What about Delete?
CNI plugin
The root cause
What we know
Apiserver requests
Illustration
What about label filters?
Informers instead of List How do informers work?
Back to the incident
Nodegroup controller?
How did it work?
What we learned
Conclusion
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Introduction to Cloud Infrastructure TechnologiesLinux Foundation via edX Scalable Microservices with Kubernetes
Google via Udacity Google Cloud Fundamentals: Core Infrastructure
Google via Coursera Introduction to Kubernetes
Linux Foundation via edX Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX