SRE in the Small and in the Large
Offered By: USENIX via YouTube
Course Description
Overview
Explore a comprehensive conference talk from LISA16 that delves into the principles of Site Reliability Engineering (SRE) and their applicability to organizations of all sizes. Discover how SRE practices, often associated with large-scale systems engineering, can be effectively implemented in both small startups and major corporations like Google. Learn about key SRE concepts, including load testing, application state management, monitoring, dependency management, sharding, and distributed applications. Gain insights into common objections to SRE implementation and understand the trade-offs involved. Examine real-world examples and case studies that illustrate the practical application of SRE principles across various scenarios. Uncover little-known facts about SRE and explore arguments for and against microservices architecture in the context of reliability engineering.
Syllabus
Introduction
Agenda
Introductions
SRE vs Software Engineering
SRE is a new way of engineering
What does SRE do
SRE in large companies
We dont have infinite chocolate
Success in Google
The SRE is doomed
Not all companies are doing SRE
Mini Europe
Story Time
Story Time 2
Pivot to the General
Load Tests
Export Application State
Monitor
Dependencies
Sharding
Distributed Applications
Little Known Fact
Most General Objection
Tradeoff
Exporting Application State
Debugging Without Application State
Bad Monitoring
Kitty and Bear
Dependency Testing
Stack Overflow
Precious Servers
Pack Sauce
Distributed Consensus
Identifiers
Microservices
Arguments against microservices
Taught by
USENIX
Related Courses
Named Data NetworkingUSENIX via YouTube Release Engineering Best Practices at Google
USENIX via YouTube Efficiently Backing Up Terabytes of Data with PgBackRest
USENIX via YouTube Network-Based LUKS Volume Decryption with Tang
USENIX via YouTube The Devopsification of Windows Server 2016
USENIX via YouTube