YoVDO

SRE 2.0: Amplifying Reliability with GenAI

Offered By: Conf42 via YouTube

Tags

Incident Response Courses Generative AI Courses Observability Courses Service-Level Objectives Courses Chaos Engineering Courses Resilience Engineering Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the future of Site Reliability Engineering (SRE) in this 50-minute conference talk that delves into the integration of Generative AI to enhance system reliability. Learn about the evolving landscape of SRE, including the Gartner SRE hype cycle and the challenges of digital transformation. Discover how GenAI is revolutionizing observability, SLIs, SLOs, error budgets, system architecture, release engineering, incident response, automation, resilience engineering, and blameless postmortems. Gain insights into practical use cases, such as automated root cause analysis, optimal error budget allocation, and real-time incident response recommendations. Understand the potential of LLMs, retrieval-augmented generation, and LLM agents in SRE practices. Acquire knowledge on prompt engineering best practices, measuring progress with business outcomes, and pitfalls to avoid when implementing GenAI in SRE workflows.

Syllabus

intro
preamble
sre 2.0 : amplifying reliability with genai
agenda
quick intro about myself
gartner sre hype cycle
sre
navigating digital transformation: managing ever-growing complexity
operations is a software problem
genai emerges: unveiling the power of next-gen artificial intelligence
unveiling the potential: the capabilityies of llm
navigating challenges: risks associated with llms
addressing model challenges: finding effective solutions
retrieval-augmented generation rag / knowledge bases
llm agents
prompt engineering best practices
prompt engineering properties
sre 2.0
genai in observability
use case - analyze log data to automatically identify root causes of performance issues
genai in sli, slo, and error budgets
use case - recommend optimal error budget allocations based on business priorities and user expectations
genai in system architecture and recovery objectives
use case - predict the impact of different failure scenarios on system availability and performance
genai in release & incident engineering
use case - provide real-time incident response recommendations based on the current situation and historical data
genai in automation
use case - analyze the effectiveness of automation workflows and recommend improvements bases on performance metrics
genai in genai in resilience engineering
use case - automate the execution of chaos experiments based on identified risk factors and failure scenarios
genai in genai in blameless postmortems
use case - analyze historical post-mortem data to identify recurring patterns and trends in incidents
measure progress with business outcomes
best practices
pitfalls to avoid
thank you.


Taught by

Conf42

Related Courses

DevOps Foundations: Chaos Engineering
LinkedIn Learning
Upgrading and Scaling DevOps Processes
Pluralsight
A Day in the Life of a Netflix Engineer
GOTO Conferences via YouTube
Antics, Drift, and Chaos
Strange Loop Conference via YouTube
Apache APISIX Ingress Controller with Litmus Chaos - Building Robust Systems
Conf42 via YouTube