How to Do SRE When You Have No SRE
Offered By: USENIX via YouTube
Course Description
Overview
Discover practical strategies for implementing Site Reliability Engineering (SRE) principles in resource-constrained environments through this 28-minute conference talk from SREcon19 Europe/Middle East/Africa. Learn how to prioritize and address critical reliability issues, even with limited time and personnel, by focusing on key areas such as capacity requirements, security, infrastructure, and third-party dependencies. Gain insights into establishing effective rules, fostering a no-blame culture, and gradually improving system reliability while balancing other responsibilities. Apply these actionable tips to enhance your organization's operational stability, reduce stress levels, and minimize potential disasters, regardless of dedicated SRE resources.
Syllabus
Intro
Why are you here
TLDR
How is this different
Cya
Review
Capacity Requirements
Security
Calendar
Infrastructure
Third Parties
DomainsSSL Certs
Releases and Updates
What to do now
Rules
Publish Your Rules
Rural Examples
No Downside
No Blame
No Human Error
Its okay to not be okay
Code changes
Network issues
Keep going
Questions
Taught by
USENIX
Related Courses
How to Not Destroy Your Production Kubernetes ClustersUSENIX via YouTube SRE and ML - Why It Matters
USENIX via YouTube Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube Tracing Bare Metal with OpenTelemetry
USENIX via YouTube Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube