Automated Metadata Management in Data Lakes - A CI/CD Driven Approach
Offered By: Databricks via YouTube
Course Description
Overview
Explore a 28-minute conference talk on implementing automated metadata management in data lakes using a CI/CD-driven approach. Learn how Northwestern Mutual engineers developed a tool to balance rapid metadata changes with robust validation for downstream system stability. Discover the architecture and design of their centralized git-managed repository for data schemas, utilizing YAML structures and CI/CD capabilities. Gain insights into maintaining information on databases, tables, and views, including schema, ownership, PII, and descriptions. Watch a live demo of creating a new table with CI/CD promotion to production, and understand how this tool can be used effectively by individuals with minimal Spark knowledge.
Syllabus
Introduction
About Northwestern Mutual
Agenda
Need for metadata management
Ease of use
Design
Configuration Files
Demo
CICD
Wrap Up
Taught by
Databricks
Related Courses
内存数据库管理openHPI CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX Processing Big Data with Azure Data Lake Analytics
Microsoft via edX Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera