YoVDO

Refining Systems Data without Losing Fidelity

Offered By: USENIX via YouTube

Tags

SREcon Courses Statistics & Probability Courses Data Analysis Courses Sampling Courses Data Management Courses Systems Engineering Courses

Course Description

Overview

Explore strategies for efficiently managing and refining systems data in large-scale observability infrastructures without compromising data fidelity. Learn how to scale back the flood of data while retaining crucial information for troubleshooting and understanding production behaviors. Discover statistical techniques to gather accurate, specific, and error-bounded data on services' top-level performance and inner workings. Examine methods to keep context of anomalous data flows and cases while preventing ordinary data from overwhelming the system. Delve into three key strategies: reducing data volume, reusing information through sampling techniques, and recycling data through aggregation. Understand the importance of structuring data, implementing effective sampling rates, and utilizing aggregation as a last resort. Gain insights on normalizing per-key, retaining errors and slow queries, and harmonizing metrics with events to create a robust observability system that balances cost-effectiveness with comprehensive system insights.

Syllabus

Intro
Complex systems are hard to manage.
User experiences.
User experiences marbles.
without breaking the bank?
Three strategies for taming the spew.
Reduce. Reuse. Recycle.
Store less data.
Stop writing read-never data.
First, structure your data.
One event per transaction.
Often, trimming isn't enough.
Sample your data.
Statistics to the rescue!
Count 1/N events.
Count traces together.
Don't be afraid of sample rates.
Don't believe me? Ask a data scientist.
Aggregate data.
Aggregation destroys cardinality.
Temporal correlation is weak.
Math on quantiles is misleading.
Aggregation is a last resort.
How can sampling be cheap enough?
Systems scale with load.
Reconcile using the sample rate.
How can we save the relevant events?
Normalize per-key.
Different key, different probability.
Retain errors & slow queries.
Metrics and events can be friends!


Taught by

USENIX

Related Courses

Introduction to Systems Engineering
University of New South Wales via Coursera
Systems Engineering: Theory & Practice
Indian Institute of Technology Kanpur via Swayam
Искусство системного инжиниринга и менеджмента 2.0
Moscow Institute of Physics and Technology via Coursera
MBSE: Model-Based Systems Engineering
University at Buffalo via Coursera
Electrical Engineering: Sensing, Powering and Controlling
University of Birmingham via FutureLearn