YoVDO

Reliability

Offered By: NDC Conferences via YouTube

Tags

NDC Conferences Courses Service Level Agreements Courses Service-Level Objectives Courses Service Level Indicators Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the concept of reliability in complex technology ecosystems through this NDC Porto 2022 conference talk by Ricardo Castro. Delve into the importance of understanding system changes and their impact on service provision. Learn about user expectations for system performance, including uptime, responsiveness, speed, consistency, and reliability. Discover how system reliability directly correlates to user satisfaction and business success. Examine the definition of reliability from a user-centric perspective and understand why perfection isn't always necessary. Gain insights into practical approaches for addressing reliability challenges with limited resources. Explore key concepts such as Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs). Learn how to create effective SLOs, document them, and set acceptable targets. Understand the importance of visualization, alerts, and error budget policies in maintaining system reliability. Discover the extended reliability stack and why these concepts are crucial for modern technology ecosystems.

Syllabus

Intro
An example from the real world
What is Reliability?
Service Level Indicator (SLI)
Service Level Objective (SLO)
How to create a good SLOS
SLO Document
What is an acceptable target?
Back-of-the-envelope costs calculations
Service Level Agreement (SLA)
Visualization
Alerts
Error Budget Policy
Reliability Stack Extended
Why is this important?
Shameless plug


Taught by

NDC Conferences

Related Courses

Developing a Google SRE Culture
Google Cloud via Coursera
Site Reliability Engineering: Measuring and Managing Reliability
Pluralsight
Site Reliability Engineering: Measuring and Managing Reliability
Pluralsight
Developing a Google SRE Culture en Français
Google Cloud via Coursera
Identifying and Resolving Application Latency for Site Reliability Engineers
Google Cloud via Coursera