Developing a Google SRE Culture
Offered By: Google Cloud via Coursera
Course Description
Overview
In many IT organizations, incentives are not aligned between developers, who strive for agility, and operators, who focus on stability. Site reliability engineering, or SRE, is how Google aligns incentives between development and operations and does mission-critical production support. Adoption of SRE cultural and technical practices can help improve collaboration between the business and IT. This course introduces key practices of Google SRE and the important role IT and business leaders play in the success of SRE organizational adoption.
Syllabus
- Welcome to Developing a Google SRE Culture
- This module provides a course overview. You will learn why this course is beneficial for IT and business leaders who want to embrace SRE culture, and what topics each module covers.
- DevOps, SRE, and Why They Exist
- This module explains the components of DevOps philosophy, why Site Reliability Engineering came to exist, and who in an organization can and should practice SRE.
- SLOs with Consequences
- This module covers the value of SRE to an organization, as well as the technical and cultural fundamentals related to reducing organizational silos and accepting failure as normal. Topics include the SRE technical practices of blameless postmortems, service-level objectives (SLOs), and error budgets, and the SRE cultural practices of blamelessness, psychological safety, unified vision, collaboration and communication, and knowledge sharing.
- Make Tomorrow Better than Today
- Continuous, gradual testing as well as automation are very important in SRE culture. This module covers the SRE technical concepts of continuous integration, continuous delivery, and canarying as they relate to the DevOps pillar of implementing gradual change. You'll learn about the concepts of toil and automation, and the idea of automating this year’s job away. You'll also learn about SRE cultural practices of design thinking, prototyping, and how you can support your teams through change.
- Regulate Workload
- In this module, you'll learn about SRE practices around measuring everything, specifically reliability and toil, and the concept of monitoring. We’ll also cover the cultural fundamentals of goal-setting, transparency, and data-driven decision making.
- Apply SRE in Your Organization
- In this module, we will talk about ways you can assess and understand your organization’s maturity and readiness for adopting SRE principles, practices, and culture. We’ll also discuss the types of skills to look for in hiring new SREs and how to upskill your current workforce. Lastly, we’ll give you advice on how to start thinking about setting up an SRE org, and the additional support our Google Cloud Professional Services teams can provide your organization as you continue on your journey to SRE.
- Final Assessment
- Test your overall knowledge of Google SRE technical and cultural practices with this summative quiz. You must score an 80% to pass. This assessment is required in order to receive your course completion certificate.
Taught by
Google Cloud Training
Tags
Related Courses
SRE CapstoneIBM via edX SRE Fundamentals and Security
IBM via edX Developing a Google SRE Culture en Français
Google Cloud via Coursera Introduction to DevOps and Site Reliability Engineering
Linux Foundation via edX Debugging Applications for Site Reliability Engineers
Google Cloud via Coursera