Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution
Offered By: Meta via YouTube
Course Description
Overview
Syllabus
Intro
Approach to data scaling problems
2 total engineers
First bottleneck: disk IO on old Amazon EBS
Django DB Routers
PG Replication to bootstrap nodes
Scaling up Redis
fork() and COW
Vertical partitioning by data type
No easy migration story; mostly double-writing
Replicating + deleting often leaves fragmentation
Why not Redis for kv caching?
Slab allocator
Focus on client
Testing & monitoring kept concurrent fires to a minimum
Scaling Out
Database Scale Out
Double write, shadow reads
Stressing about Primary Key
Data loss, segfaults
train + rapidly approaching cliff
Logical partitioning, done at application level
note to self: pick a power of 2 next time
Postgres "schemas"
9.2 upgrade: bucardo to move schema by schema
ID generation
Snowflake, other options
41 bits: time in millis (41 years of IDs) 13 bits: logical shard ID 10 bits: auto-incrementing sequence, modulo 1024.
Lesson learned
minimize moving parts
Ending the year
Launched Android
Stability, FB
Scaling cut-overs, ramp- ups, and development
Dynamic ramp-ups and config
Python Knobs
Decouple deploy from feature rollout
In memory requirement
Simplest thing was breaking
Trimming
C* cluster is 35% of the size of Redis one
Handling deletes
Redis way: LREM
Not so hot for an AP system
2014 project
Spam fighting
Generic features + machine learning
Hadoop + Hive + Presto
2010 vintage infra
#1 impact: recruiting
Wrap up
Taught by
Meta Developers
Related Courses
Cybersecurity Policy for Water and Electricity InfrastructuresUniversity of Colorado System via Coursera Continuous Delivery & DevOps
University of Virginia via Coursera Preparing for your Professional Cloud Architect Journey
Google Cloud via Coursera Infrastructure Planning and Managements
Indian Institute of Technology Madras via Swayam Public Library Management
University of Michigan via edX