YoVDO

An SLO-Driven Approach to Enhance Kubernetes Cluster Reliability

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Conference Talks Courses Reliability Engineering Courses Service Level Objectives (SLOs) Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an SLO-driven approach to enhance Kubernetes cluster reliability in this conference talk from KubeCon + CloudNativeCon Europe 2021. Delve into the challenges of defining reliability for large-scale Kubernetes clusters and learn how Service Level Objectives (SLOs) can be effectively implemented. Discover the philosophy behind SLO-driven reliability engineering and gain insights from Ant Financial's experience with one of the world's largest Kubernetes clusters. Examine concrete cases and lessons learned in building SLO frameworks, covering aspects such as monitoring, alerting, and tracing. Understand the complexities of defining SLOs for Kubernetes services compared to classic web services, and explore topics including fleet management, EZE SLO design, fine-grained and component SLOs, alerting philosophy, and SLO management.

Syllabus

Thank You to Our Session Recording Sponsor
Outline
Motivation
Fleet Management
General Approach
SLO Approach
SLO Recap
What SRE cares on K8S?
EZE SLO Design
Fine-grained SLO
Component SLO
Overall SLO Graph
Why RatioRate is bad?
Alerting Philosophy
SLO Management


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

Building Geospatial Apps on Postgres, PostGIS, & Citus at Large Scale
Microsoft via YouTube
Unlocking the Power of ML for Your JavaScript Applications with TensorFlow.js
TensorFlow via YouTube
Managing the Reactive World with RxJava - Jake Wharton
ChariotSolutions via YouTube
What's New in Grails 2.0
ChariotSolutions via YouTube
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks via YouTube