Tackling Kafka with a Small Team
Offered By: USENIX via YouTube
Course Description
Overview
Learn about deploying Kafka at scale with limited resources in this SREcon19 Americas conference talk. Explore the challenges and solutions of managing a complex distributed system like Kafka with a small team while experiencing rapid business growth. Discover tactical approaches to overcoming obstacles, including handling executive demands, product manager experiments, and application engineer visibility needs. Gain insights into engineering tradeoffs, technical debt management, and library overhaul strategies. Follow one engineer's journey from failures to successes in conquering Kafka, offering valuable lessons for teams facing similar distributed system challenges.
Syllabus
Intro
Who am I
What I do
Setting the stage
Why Kafka
What we needed
What we wanted
Jepson analysis
Life comes at you fast
Executives want to make realtime business decisions
Product managers want to run experiments
Application engineers want visibility
Is Kafka down
Application Engineers Perspective
Implicit Trust
Suffering from Success
The Problem with the Kind of Spiderman
No Dashboard
Library Ownership
Engineering Tradeoffs
Replacing Unhealthy Brokers
Replacing Unhealthy Hosts
Rolling Restarts
Technical Debt
Overhaul the Library
Python Ghost Shop
Conclusion
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube