YoVDO

MySQL Automation at Facebook Scale

Offered By: USENIX via YouTube

Tags

SREcon Courses Inventory Management Courses Database Management Courses System Architecture Courses

Course Description

Overview

Explore Facebook's massive MySQL database cluster management in this 50-minute SREcon15 talk. Discover how Facebook automates conventional DBA tasks to operate thousands of servers across multiple data centers. Delve into the design and architecture of their automation systems, including inventory management, automated provisioning, and central resource locking. Learn about server lifecycle management, handling full disks, RAID issues, and MySQL version updates. Examine the metadata database design and local triage processes. Understand how Facebook turns up new servers and manages maintenance at scale. Gain insights from real-world challenges, such as "The Case of the Helpful Janitor" and large-scale replacements. Prepare for an in-depth look at MySQL automation strategies employed by one of the world's largest database clusters.

Syllabus

Intro
What is this talk about?
Terminology
Inventory Management
Directory Service
Automated Provisioning
Automated Promotions
Central Resource Locking
Server's Lifecycle at Facebook
Full Disk
Bad RAID
Old MySQL Version
Metadata Database
Design Overview
Local Triage -MPS' Agent
Main Components
mps, examples
Picking a destination
Chosen Algorithm
Result: Pretty Graphs
Turning Up New Servers
Maintenance at Scale
The Case of the Helpful Janitor
REPLACE ALL THE THINGS!
Prepare for takeoff!
What happened?


Taught by

USENIX

Related Courses

How to Not Destroy Your Production Kubernetes Clusters
USENIX via YouTube
SRE and ML - Why It Matters
USENIX via YouTube
Knowledge and Power - A Sociotechnical Systems Discussion on the Future of SRE
USENIX via YouTube
Tracing Bare Metal with OpenTelemetry
USENIX via YouTube
Improving How We Observe Our Observability Data - Techniques for SREs
USENIX via YouTube