YoVDO

Availability, Latency, and Cost - Withstanding Regional Outages

Offered By: USENIX via YouTube

Tags

SREcon Courses Sharding Courses Cost Management Courses System Architecture Courses High Availability Courses Incident Management Courses System Resilience Courses

Course Description

Overview

Explore a conference talk on achieving high availability and low latency through multi-region deployment strategies. Dive into Netflix's journey of transforming regional resiliency from a cost driver to a strategic advantage. Learn about algebraic models, code practices, and incident management playbooks developed to refine multi-region operations at scale. Discover key considerations for determining the optimal number of regions, user steering methods, and failover procedures. Gain insights into the human and system dynamics involved in managing regional outages, and understand how Netflix turned routine failovers into a seamless process. Examine the architectural decisions, sharding techniques, and replication strategies that contribute to improved availability and performance.

Syllabus

Introduction
Agenda
Netflix
Island Model
Fear
Recovery vs Prevention
High Availability Overview
Algebraic Models
Latency
Cumulative Distribution Function
Cost
Architecture
Sharding
Region Replication
Closing Thoughts
Questions
Summary
Cost Function
Model Change
Alternatives


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube