Data Mesh in Practice - From Data Lake to Distributed Architecture at Zalando
Offered By: Databricks via YouTube
Course Description
Overview
Explore how Europe's leading online fashion platform transitioned from a centralized Data Lake to a distributed Data Mesh architecture in this 30-minute talk. Learn about the challenges of the Data Lake paradigm, including unclear responsibilities, lack of data ownership, and poor data availability. Discover how Zalando addressed these issues by implementing a decentralized, domain-focused approach that empowers data owners and promotes the concept of Data Products. Gain insights into the journey of building a Data Mesh architecture backed by Spark and Delta Lake, and understand ongoing efforts to simplify data product creation. Examine topics such as domain-driven distributed architecture, self-service data infrastructure, and the "Bring Your Own Bucket" concept. Delve into strategies for ensuring data quality through consumer-producer contracts and learn about central services with global interoperability in this informative presentation from Databricks.
Syllabus
Intro
Legacy Analytics
Legacy Evolving
Zalando's Data Lake
Centralization Challenges
A Recurring Pattern
What is Data Mesh?
Domain-Driven Distributed Architecture... applied to Data
backed by domain-agnostic self-service data infrastructure
It's a mindset shift
Bring Your Own Bucket (BYOB)
Central Processing Platform
Simplify Data Sharing
Central Services with Global Interoperability
How to Ensure Data Quality?
Data Quality - A Contract between Consumer and Producer
Taught by
Databricks
Related Courses
Data Lakes for Big DataEdCast Distributed Computing with Spark SQL
University of California, Davis via Coursera Modernizing Data Lakes and Data Warehouses with Google Cloud
Google Cloud via Coursera Data Engineering with AWS
Udacity Preparing for Google Cloud Certification: Cloud Data Engineer
Google Cloud via Coursera