Version Control for Lakehouse Architecture - Essential Practices and Benefits
Offered By: Databricks via YouTube
Course Description
Overview
Discover how to implement engineering best practices for data products using data version control with lakeFS in this 15-minute conference talk sponsored by lakeFS. Learn why version control is essential for your lakehouse architecture when developing and maintaining data/ML pipelines using Databricks. Explore techniques to improve data quality and velocity, including experimenting during development, testing data quality in isolation, automating quality validation tests, and achieving full reproducibility of data pipelines. Understand how poor data quality or lack of reproducibility can impact products relying on analytics or machine learning. Gain insights from Oz Katz, CTO & Co-creator of lakeFS, on implementing data version control to enhance your data products. Additional resources on the Rise of the Data Lakehouse and Lakehouse Fundamentals Training are provided for further exploration.
Syllabus
Sponsored by: lakeFS | Why Version Control is Essential for Your Lakehouse Architecture
Taught by
Databricks
Related Courses
Multi-Table Transactions with LakeFS and Delta Lake - Tech TalkDatabricks via YouTube CI/CD for Data - Building Dev/Test Data Environments with Open Source Stacks
CNCF [Cloud Native Computing Foundation] via YouTube Building Reproducible ML Processes with an Open Source Stack
Linux Foundation via YouTube Power Up Your Lakehouse with Git Semantics and Delta Lake
Databricks via YouTube Developing Data Pipelines with Branch Deployments - A New Approach
Databricks via YouTube