YoVDO

Data Discovery at Databricks with Amundsen - Improving Productivity and Trust

Offered By: Databricks via YouTube

Tags

Data Governance Courses Databricks Courses Data Lineage Courses

Course Description

Overview

Explore how Databricks leverages Amundsen, an open-source data discovery tool, to enhance productivity and trust in internal data exploration. Learn about the integration of Amundsen with Databricks' infrastructure to surface metadata, including popular tables, fuzzy and facet search capabilities, and rich dataset information such as lineage, ownership, and usage statistics. Discover how the tool provides insights on ETL jobs, column statistics, and associated dashboards. Gain knowledge about the implementation of user feedback and plans to extend these discovery improvements to Databricks customers. Delve into the deployment details, development process, and specific metadata surfaced in Amundsen, including table lineage generation, Delta table extended metadata, and Redash dashboard integration. Understand the benefits of this data discovery solution compared to the previous static wiki approach and its potential impact on data-driven decision-making within the organization.

Syllabus

Intro
Data-Driven Decisions
Data Discovery Not Productive
What is Amundsen
Dataset detail page
Lineage between dashboards and dataset
Search for existing dashboards/reports
Dashboard detail page
Search for co-workers
Central data quality issue portal
Data Preview
Databricks Lakehouse
Deployment detailed
Development
Metadata surfaced in amunden
Lineage information
What is table lineage
How is the lineage table generated?
Statistics information
Delta table extended metadata
Notebook structure
Redash dashboards
Sample data
Amundsen Open Source


Taught by

Databricks

Related Courses

Data Processing with Azure
LearnQuest via Coursera
Mejores prácticas para el procesamiento de datos en Big Data
Coursera Project Network via Coursera
Data Science with Databricks for Data Analysts
Databricks via Coursera
Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera
Curso Completo de Spark con Databricks (Big Data)
Coursera Project Network via Coursera