SRE in the Small and in the Large
Offered By: USENIX via YouTube
Course Description
Overview
Explore a comprehensive conference talk from LISA16 that delves into the principles of Site Reliability Engineering (SRE) and their applicability to organizations of all sizes. Discover how SRE practices, often associated with large-scale systems engineering, can be effectively implemented in both small startups and major corporations like Google. Learn about key SRE concepts, including load testing, application state management, monitoring, dependency management, sharding, and distributed applications. Gain insights into common objections to SRE implementation and understand the trade-offs involved. Examine real-world examples and case studies that illustrate the practical application of SRE principles across various scenarios. Uncover little-known facts about SRE and explore arguments for and against microservices architecture in the context of reliability engineering.
Syllabus
Introduction
Agenda
Introductions
SRE vs Software Engineering
SRE is a new way of engineering
What does SRE do
SRE in large companies
We dont have infinite chocolate
Success in Google
The SRE is doomed
Not all companies are doing SRE
Mini Europe
Story Time
Story Time 2
Pivot to the General
Load Tests
Export Application State
Monitor
Dependencies
Sharding
Distributed Applications
Little Known Fact
Most General Objection
Tradeoff
Exporting Application State
Debugging Without Application State
Bad Monitoring
Kitty and Bear
Dependency Testing
Stack Overflow
Precious Servers
Pack Sauce
Distributed Consensus
Identifiers
Microservices
Arguments against microservices
Taught by
USENIX
Related Courses
Introduction to Cloud Infrastructure TechnologiesLinux Foundation via edX Scalable Microservices with Kubernetes
Google via Udacity Introduction to Kubernetes
Linux Foundation via edX Architecting Distributed Cloud Applications
Microsoft via edX IBM Cloud: Deploying Microservices with Kubernetes
IBM via Coursera