Develop a Site Reliability Engineering (SRE) strategy
Offered By: Microsoft via Microsoft Learn
Course Description
Overview
- Module 1: Learn about SRE, an engineering discipline that helps you sustainably achieve the appropriate level of reliability in your systems, services, and products.
In this module you will:
- Gain a basic understanding of Site Reliability Engineering (SRE)
- Learn how to get started with this valuable operations practice
- Module 2: Respond to incidents and activities in your infrastructure through alerting capabilities in Azure Monitor.
In this module, you'll:
- Configure alerts on events in your Azure resources based on metrics, log events, and activity log events.
- Learn how to use action groups in response to an alert, and how to use alert processing rules to override action groups when necessary.
- Module 3: Learn about how to capture trace output from your Azure web apps. View a live log stream and download logs files for offline analysis.
In this module, you will:
- Enable application logging on an Azure Web App
- View live application logging activity with the log streaming service
- Retrieve application log files from an application with Kudu or the Azure CLI
- Module 4: Learn how to manage site reliability.
After completing this module, you'll be able to:
- Describe how site reliability engineering (SRE) empowers software developers to own the ongoing daily operation of their applications in production.
- Describe how Application Insights analyzes the performance of your web application and can warn you about potential problems.
- List the processes that you can implement to monitor site reliability.
- Build a "just culture" that balances safety and accountability.
- Module 5: Cloud Admin course from Dr. Majd Sakr at Carnegie Mellon University. Discover what cloud elasticity means and different ways to scale your cloud resources.
In this module you will:
- Describe common load patterns and how they drive the need to scale
- Enumerate the strategies and considerations in scaling cloud applications
- Discuss the advantages of auto-scaling and the mechanisms used to achieve it
- Describe the importance of load balancing in cloud applications and enumerate various methods to achieve it
- List the primary benefits of serverless computing and explain the concept of serverless functions
This content is provided in partnership with Dr. Majd Sakr and Carnegie Mellon University.
- Module 6: Carnegie Mellon University's Cloud Developer course. Learn how developers write programs that run on the cloud, including how to deploy, be fault-tolerant, load balance, scale, and deal with latency.
In this module, you will:
- Evaluate different considerations when programming applications that run on clouds
- Evaluate different considerations when deploying applications on clouds
- Compare and contrast proactive and reactive measures for fault tolerance in cloud applications
- Describe the importance of load balancing in cloud applications and enumerate various methods to achieve it
- Enumerate the strategies and considerations in scaling cloud applications
- Motivate the case for minimizing tail latency and discuss the various strategies to reduce tail latency
- Describe the strategies to optimize total operational cost of using cloud services
In partnership with Dr. Majd Sakr and Carnegie Mellon University.
- Module 7: Learn how to troubleshoot inbound network connectivity for Azure Load Balancer.
In this module, you will:
- Identify common Azure Load Balancer inbound connectivity issues.
- Identify steps to resolve issues when virtual machines aren't responding to health probe.
- Module 8: Learn how to monitor the health of your Azure VMs by using Azure Metrics Explorer and metric alerts.
In this module, you will:
- Identify metrics and diagnostic data that you can collect for virtual machines
- Configure monitoring for a virtual machine
- Use monitoring data to diagnose problems
Syllabus
- Module 1: Module 1: Introduction to Site Reliability Engineering (SRE)
- Introduction to Site Reliability Engineering
- What is SRE and why does it matter?
- SRE in context
- Key SRE principles and practices: virtuous cycles
- Key SRE principles and practices: The human side of SRE
- Getting started with SRE
- Summary
- Module 2: Module 2: Improve incident response with alerting on Azure
- Introduction
- Explore the different alert types that Azure Monitor supports
- Use metric alerts for alerts about performance issues in your Azure environment
- Exercise - Use metric alerts to alert on performance issues in your Azure environment
- Use log alerts to alert on events in your application
- Use activity log alerts to alert on events within your Azure infrastructure
- Use action groups and alert processing rules to send notifications when an alert is fired
- Exercise -Use an activity log alert and an action group to notify users about events in your Azure infrastructure
- Summary
- Module 3: Module 3: Capture Web Application Logs with App Service Diagnostics Logging
- Introduction
- Enable and configure App Service application logging
- Exercise - Enable and configure App Service application logging using the Azure portal
- View live application logging with the log streaming service
- Exercise - View live application logging with the log streaming service using Azure CLI
- Retrieve application log files
- Exercise - Retrieve Application Log Files using Azure CLI and Kudu
- Summary
- Module 4: Module 4: Manage site reliability
- Introduction
- What is reliability engineering?
- What is Application Insights?
- Perform ongoing tuning to reduce meaningless alerts
- Analyze alerts to establish a baseline
- Blameless postmortems
- Knowledge check
- Summary
- Module 5: Module 5: Scale your cloud resources with elasticity
- Introduction
- Compute load patterns
- Scaling compute resources
- Automated scaling on the cloud
- Load balancing
- Serverless computing
- Summary
- Module 6: Module 6: Build applications on the cloud
- Introduction
- Programming the cloud
- Deploy applications on the cloud
- Build fault-tolerant cloud services
- Load balancing
- Scale resources
- How to deal with tail latency
- Economics for cloud applications
- Summary
- Module 7: Module 7: Troubleshoot inbound network connectivity for Azure Load Balancer
- Introduction
- Troubleshoot Azure Load Balancer
- Diagnose issues by reviewing configurations and metrics
- Exercise - Set up your environment
- Exercise - Identify and resolve inbound network connectivity
- Summary
- Module 8: Module 8: Monitor the health of your Azure virtual machine by using Azure Metrics Explorer and metric alerts
- Introduction
- Monitor the health of the virtual machine
- Exercise - Set up a VM with boot diagnostics
- View VM metrics
- Configure the Azure Diagnostics extension
- Exercise - Configure the Azure Diagnostics extension
- Diagnostic data case studies
- Exercise - Use diagnostic data
- Summary
Tags
Related Courses
Introduction to SAP HANA Cloud PlatformSAP Learning Developing Applications with Google Cloud 日本語版
Google Cloud via Coursera Introduction to Web Development with HTML5, CSS3, and JavaScript.
IBM via edX Cloud Applications
Georgia Institute of Technology via Coursera Cloud Application Development Foundations
IBM via edX