YoVDO

Automating Performance Tuning with Machine Learning

Offered By: USENIX via YouTube

Tags

SREcon Courses Machine Learning Courses Kubernetes Courses Microservices Courses Reliability Engineering Courses Performance Tuning Courses Configuration Management Courses Cost Optimization Courses

Course Description

Overview

Explore a novel approach to automating performance tuning using machine learning in this 22-minute conference talk from SREcon21. Dive into the challenges faced by Site Reliability Engineers (SREs) in optimizing application performance, stability, and availability through configuration management. Learn how reinforcement learning techniques can be leveraged to find optimal configurations based on specific optimization goals, such as minimizing service latency or cloud costs. Examine a practical example of optimizing Kubernetes microservice cost and latency by tuning container resources and JVM options. Analyze the discovered optimal configurations, identify the most impactful parameters, and gain valuable insights for tuning microservices. Understand the key requirements for implementing this innovative approach and how it transforms the performance tuning process. Discover how machine learning enables smart exploration and automated performance tuning, potentially improving cost efficiency by up to 77% and increasing throughput by 28% while meeting Service Level Objectives (SLOs).

Syllabus

Intro
SREs care about efficiency and performan
Tuning system configuration matters...
but it is getting harder and harder
Key requirements for a new approach
ML techniques for smart exploration
ML enables automated performance tuning
and a new performance tuning process
The target system: Online Boutique
Use Case: optimizing cost of K8s microservices while ensuring reliability
The reference architecture
The optimization goals & constraints
Best configuration found by ML in 24H improves cost efficiency by 77%
Best config: optimal resources assigned to microservices
Best config: higher performance & efficiency for the overall service Baseline vs Best Service throughout Baseline vs Best Service po response time
Use Case: maximizing service performance & efficiency with JVM tuning
Best config: +28% throughput, and meeting SLOS
Best config: optimal JVM options 8
Key takeaways


Taught by

USENIX

Related Courses

Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms
DevOps for Developers: How to Get Started
Microsoft via edX
Configuration Management on Google Cloud Platform
Google via Coursera
Windows Server 2016: Infrastructure
Microsoft via edX
Introduction to SAP HANA Administration
SAP Learning