YoVDO

Breaking Out of the Proprietary Cage - Real-time Data Warehouses in Open Source

Offered By: Linux Foundation via YouTube

Tags

Data Warehousing Courses SQL Courses Kubernetes Courses Grafana Courses Materialized Views Courses ClickHouse Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the world of open-source real-time data warehouses in this 51-minute Linux Foundation conference talk. Delve into the unique characteristics of analytic applications and SQL data warehouses, with a focus on ClickHouse and its Merge Tree table engine. Discover the intricacies of data layout, storage optimization, and parallelized execution. Learn about materialized views, tiered storage, and distributed queries. Examine patterns for Kafka-based ingestion pipelines, Grafana visualization, and operation on Kubernetes. Gain insights into breaking free from proprietary solutions and leveraging open-source technologies for efficient data warehousing and analytics.

Syllabus

Intro
What makes analytic applications special?
SQL data warehouses run analytic queries
What ClickHouse is not
Merge Tree is the workhorse table engine
Merge Tree data layout
Detailed storage layout within a single part /var/lib/clickhouse/data/airline/ontime
Adding CPUs boosts parallelized execution
Effect on storage is dramatic
Materialized views restructure/reduce data
Alternative pattern: Tiered storage
How do distributed queries work?
Pattern: Kafka-based ingestion pipelines
Alternative ingest pattern: Kafka engine
Pattern: Grafana visualization
Pattern: Operation on Kubernetes


Taught by

Linux Foundation

Tags

Related Courses

A Beginner’s Guide to Docker
Packt via FutureLearn
A Beginner's Guide to Kubernetes for Container Orchestration
Packt via FutureLearn
A Practical Guide to Amazon EKS
A Cloud Guru
Advanced Networking with Kubernetes on AWS
A Cloud Guru
AIOps Essentials (Autoscaling Kubernetes with Prometheus Metrics)
A Cloud Guru