YoVDO

Automated Metadata Management in Data Lakes - A CI/CD Driven Approach

Offered By: Databricks via YouTube

Tags

Data Lakes Courses Git Courses CI/CD Courses YAML Courses Data Governance Courses Data Engineering Courses

Course Description

Overview

Explore a 28-minute conference talk on implementing automated metadata management in data lakes using a CI/CD-driven approach. Learn how Northwestern Mutual engineers developed a tool to balance rapid metadata changes with robust validation for downstream system stability. Discover the architecture and design of their centralized git-managed repository for data schemas, utilizing YAML structures and CI/CD capabilities. Gain insights into maintaining information on databases, tables, and views, including schema, ownership, PII, and descriptions. Watch a live demo of creating a new table with CI/CD promotion to production, and understand how this tool can be used effectively by individuals with minimal Spark knowledge.

Syllabus

Introduction
About Northwestern Mutual
Agenda
Need for metadata management
Ease of use
Design
Configuration Files
Demo
CICD
Wrap Up


Taught by

Databricks

Related Courses

Introduction Pratique à YAML
Coursera Project Network via Coursera
Ansible Automation For Beginners to Advance - Step by Step
Udemy
Kubernetes for Developers: Deploying Your Code
Pluralsight
Continuous Delivery and DevOps with Azure DevOps: Managing Builds
Pluralsight
Automating Infrastructure Deployment Using Google Cloud Deployment Manager
Pluralsight