YoVDO

Why Are Distributed Systems So Hard?

Offered By: USENIX via YouTube

Tags

LISA (Large Installation System Administration) Conference Courses Distributed Systems Courses Consensus Algorithms Courses CAP Theorem Courses Observability Courses

Course Description

Overview

Explore the complexities of distributed systems in this 33-minute conference talk from LISA19. Delve into the history of distributed computing, debunk common myths about the CAP theorem, and understand why network partitions are inevitable. Examine popular consensus algorithms and their role in mitigating risks associated with distributed operations. Learn how to design systems that account for human factors, enhancing adaptability and reducing the impact of programmatic uncertainty. Gain insights into data evolution, scaling challenges, cloud computing, and the concept of shared nothing architecture. Investigate the intricacies of unreliable message delivery, building observability, and the practical realities of hardware failures. Discover strategies for incident analysis, blameless discussions, and designing systems that prioritize human interaction and understanding.

Syllabus

Introduction
Agenda
Storytime
Data Evolution
Scaling
Cloud Computing
Why Scale Horizontally
What Does It Mean To Run A Distributed System
A Node On Distributed Computing
Summary
Shared Nothing Architecture
Unreliable Message Delivery
Why Are We Fenced Off
Building Observability
What We Can Know
The Cap Theorem
C
Replication Lag
Consistency is a Spectrum
Availability is Not Binary
Partition Tolerance
Hardware
Hardware Failure
Cables
Sharks
Kevlar
Network Partitions
Resource Isolation
Process Suspension
Network Glitch
People do bad things
Why does this matter
Practical reality
The correctness result
Mitigation strategies
Consensus Algorithms
The Woods Theorem
Building Mental Models
Incident Analysis
Blameless Discussions
Mental Models
Human Failure
Alert Fatigue
User Mindsets
Designing Systems for Humans
HugOps


Taught by

USENIX

Related Courses

Advanced Operating Systems
Georgia Institute of Technology via Udacity
High Performance Computing
Georgia Institute of Technology via Udacity
GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity
Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX
CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX