YoVDO

Patterns and Operational Insights for Large-Scale Delta Lake Workloads

Offered By: Databricks via YouTube

Tags

Delta Lake Courses Big Data Courses Schema Design Courses Data Engineering Courses

Course Description

Overview

Explore effective patterns and operational insights from early adopters of Delta Lake in this 42-minute conference talk. Discover how to handle demanding workloads over large volumes of log and telemetry data for cyber threat detection and response. Learn about streaming ETL, data enrichments, analytic workloads, and large materialized aggregates for fast answers. Dive into Z-ordering optimization techniques, including schema design considerations and the 32-column default limit. Understand the implications of date partitioning with long-tail distributions and unsynchronized clocks. Gain insights on optimization strategies, including when to use auto-optimize. Explore upsert patterns that simplify important jobs and learn how to tune Delta Lake for very large tables and low-latency access. Benefit from real-world experiences in operating large-scale workloads on Databricks and Delta Lake, covering topics such as the Parse Framework, merge operations, stateful processing, scaling, schema ordering, partitioning, and handling conflicting transactions.

Syllabus

Introduction
Parse Framework
Merge
Stateful Processing
Merged Tables
Scaling
Schema Ordering
Partitioning
Conflicting transactions
Metadata


Taught by

Databricks

Related Courses

Distributed Computing with Spark SQL
University of California, Davis via Coursera
Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera
Building Your First ETL Pipeline Using Azure Databricks
Pluralsight
Implement a data lakehouse analytics solution with Azure Databricks
Microsoft via Microsoft Learn
Perform data science with Azure Databricks
Microsoft via Microsoft Learn