YoVDO

Tackling Kafka with a Small Team

Offered By: USENIX via YouTube

Tags

SREcon Courses Distributed Systems Courses Technical Debt Courses

Course Description

Overview

Learn about deploying Kafka at scale with limited resources in this SREcon19 Americas conference talk. Explore the challenges and solutions of managing a complex distributed system like Kafka with a small team while experiencing rapid business growth. Discover tactical approaches to overcoming obstacles, including handling executive demands, product manager experiments, and application engineer visibility needs. Gain insights into engineering tradeoffs, technical debt management, and library overhaul strategies. Follow one engineer's journey from failures to successes in conquering Kafka, offering valuable lessons for teams facing similar distributed system challenges.

Syllabus

Intro
Who am I
What I do
Setting the stage
Why Kafka
What we needed
What we wanted
Jepson analysis
Life comes at you fast
Executives want to make realtime business decisions
Product managers want to run experiments
Application engineers want visibility
Is Kafka down
Application Engineers Perspective
Implicit Trust
Suffering from Success
The Problem with the Kind of Spiderman
No Dashboard
Library Ownership
Engineering Tradeoffs
Replacing Unhealthy Brokers
Replacing Unhealthy Hosts
Rolling Restarts
Technical Debt
Overhaul the Library
Python Ghost Shop
Conclusion


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube