YoVDO

Data Quality Tools Comparison for Continuous Data Imports

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Anomaly Detection Courses Data Pipelines Courses ETL Courses Data Profiling Courses

Course Description

Overview

Explore open-source solutions for ensuring data quality in continuous import scenarios in this 28-minute presentation from Databricks. Compare popular options like Apache Griffin, Deequ, DDQ, and Great Expectations across dimensions such as maturity, documentation, extensibility, and features including data profiling and anomaly detection. Learn about various data quality approaches, tools, and frameworks, including ETL processes, quality checks, code generation, and advanced uniqueness checks. Gain insights into the limitations of Apache Griffin and discover how to implement timely data quality assurance in your organization's data pipeline.

Syllabus

Intro
Data Quality
ETL Process
Quality Checks
Data Quality Approaches
Data Quality Tools
Deku
Code Generation
Great Expectations
Pandas Profiling
Apache Griffin
Apache Griffin Limitations
Examples
Uniqueness checks
Advanced checks
Timely data
Other frameworks


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera