Why We Built Our Own Distributed Column Store

Offered By: Strange Loop Conference via YouTube

Course Description

Overview

Explore the architecture and implementation of Retriever, a custom-built distributed column store database, in this 43-minute Strange Loop Conference talk. Learn how Honeycomb addressed the challenges of understanding complex distributed systems in production by developing a low-latency, schemaless database inspired by Facebook's Scuba. Discover the design decisions behind Retriever, including its use of disk storage, efficient column-oriented storage model, and ability to handle multi-tenancy and cost constraints. Gain insights into the write and read paths, data model, storage format, distributed queries, and fault tolerance mechanisms. Understand how Retriever ingests events from Kafka, manages quotas, and handles failure recovery. Delve into the lessons learned from operating a hand-rolled database at production scale with paying customers, and see how it compares to other solutions for sub-second complex queries over large data volumes in real time.

Syllabus

Intro
Please meet Retriever
Retriever is a special purpose data store
What is Honeycomb?
How Honeycomb works
Honeycomb under the hood
Our requirements
Requirements - summary
Retriever at a glance
Retriever compared to Scuba
Architecture - write path
Architecture - read path
Data model - datasets
Data model - events
Row oriented storage
Column oriented storage
Storage Format - timestamp column
Storage Format - reading
Distributed queries
Distributed reads - calculations
Distributed reads - fanout
Detour - Kafka
Ingestion
Quota management
Fault tolerance
Failure recovery
Bootstrapping new nodes

Taught by

Strange Loop Conference

Why We Built Our Own Distributed Column Store

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Why We Built Our Own Distributed Column Store

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Login to Continue