YoVDO

Data Mesh in Practice - From Data Lake to Distributed Architecture at Zalando

Offered By: Databricks via YouTube

Tags

Data Mesh Courses Distributed Systems Courses Data Lakes Courses Fashion Industry Courses Domain-driven Design Courses

Course Description

Overview

Explore how Europe's leading online fashion platform transitioned from a centralized Data Lake to a distributed Data Mesh architecture in this 30-minute talk. Learn about the challenges of the Data Lake paradigm, including unclear responsibilities, lack of data ownership, and poor data availability. Discover how Zalando addressed these issues by implementing a decentralized, domain-focused approach that empowers data owners and promotes the concept of Data Products. Gain insights into the journey of building a Data Mesh architecture backed by Spark and Delta Lake, and understand ongoing efforts to simplify data product creation. Examine topics such as domain-driven distributed architecture, self-service data infrastructure, and the "Bring Your Own Bucket" concept. Delve into strategies for ensuring data quality through consumer-producer contracts and learn about central services with global interoperability in this informative presentation from Databricks.

Syllabus

Intro
Legacy Analytics
Legacy Evolving
Zalando's Data Lake
Centralization Challenges
A Recurring Pattern
What is Data Mesh?
Domain-Driven Distributed Architecture... applied to Data
backed by domain-agnostic self-service data infrastructure
It's a mindset shift
Bring Your Own Bucket (BYOB)
Central Processing Platform
Simplify Data Sharing
Central Services with Global Interoperability
How to Ensure Data Quality?
Data Quality - A Contract between Consumer and Producer


Taught by

Databricks

Related Courses

Advanced Operating Systems
Georgia Institute of Technology via Udacity
High Performance Computing
Georgia Institute of Technology via Udacity
GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity
Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX
CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX