YoVDO

Tolerating Slowdowns in Replicated State Machines Using Copilots

Offered By: USENIX via YouTube

Tags

OSDI (Operating Systems Design and Implementation) Courses Distributed Systems Courses Fault Tolerance Courses Consensus Algorithms Courses High Availability Courses

Course Description

Overview

Explore a groundbreaking approach to improving fault tolerance in replicated state machines through this 20-minute conference talk from OSDI '20. Dive into the Copilot replication protocol, the first 1-slowdown-tolerant consensus algorithm that maintains normal latency despite the slowdown of any single replica. Learn how Copilot utilizes two distinguished replicas, dependencies, deduplication, and fast takeovers to achieve superior performance in the face of slowdowns. Discover optimizations like ping-pong batching and null dependency elimination that enhance Copilot's efficiency. Compare Copilot's performance against Multi-Paxos and EPaxos, and understand how it uniquely maintains low latencies when a replica slows down. Gain insights into the protocol's design, implementation, and evaluation, making this talk essential for those interested in distributed systems, consensus algorithms, and high-availability architectures.

Syllabus

Intro
Replicated State Machine (RSM)
Fault Tolerance for High Availability
Slowdowns Hurt Availability
Slowdowns Take Different Forms
Defining Slowdown Tolerance
Multi-Paxos is Not 1-Slowdown-Tolerant
Copilot: First 1-Slowdown-Tolerant Protocol
Ordering: Use Two Logs
Ordering: Combine Logs with Dependencies
Ordering: Dependency Cycles
Ordering: A Tricky Case
Ordering: Same on All Replicas
Copilot Protocol: Dependencies?
Optimizations
Evaluation
Copilot and Fast-View-Change Tolera
Gradual Slowdown
Performance Without Slow Replicas
Conclusion


Taught by

USENIX

Related Courses

GraphX - Graph Processing in a Distributed Dataflow Framework
USENIX via YouTube
Theseus - An Experiment in Operating System Structure and State Management
USENIX via YouTube
RedLeaf - Isolation and Communication in a Safe Operating System
USENIX via YouTube
Microsecond Consensus for Microsecond Applications
USENIX via YouTube
KungFu - Making Training in Distributed Machine Learning Adaptive
USENIX via YouTube