Reliability in Distributed Systems
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore reliability in distributed systems through this 42-minute conference talk from EuroPython 2018. Delve into essential aspects of system stability, dependency management, and problem identification in microservices and APIs. Learn about multi-level monitoring techniques, including hardware, application, external, and anomaly detection. Discover standard methods for preventing system failures during dependency outages and strategies for recovering from inconsistent states. Gain valuable insights on monitoring resources, client-side monitoring, functionality checks, and anomaly detection. Understand the importance of error tracking, strategic deployment, environment management, and conducting effective postmortems. Suitable for beginners and intermediates, this talk provides a comprehensive overview of techniques to enhance the reliability and resilience of distributed systems.
Syllabus
Intro
Systems
Goal
What do we communicate with?
Timeouts
Circuit breakers
Async
Disaster recovery
Monitoring - resources
Monitoring - from client side
Monitoring - functionality
Monitoring - anomaly detection
Alerting - error tracking
Deployment where to put it?
Environments
Monitoring - web server
Postmortems
Summary - prevent failing
Summary - monitoring
Summary - testing
Summary - architecture
Summary - logging
Summary - alerting
Taught by
EuroPython Conference
Related Courses
Advanced Operating SystemsGeorgia Institute of Technology via Udacity High Performance Computing
Georgia Institute of Technology via Udacity GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX