YoVDO

Dumb-Proofing Data Pipelines: Techniques for Configurable and Maintainable ETL - Databricks

Offered By: Databricks via YouTube

Tags

Data Pipelines Courses Software Development Courses Scala Courses Databricks Courses JSON Courses Data Engineering Courses Configuration Management Courses Input Validation Courses

Course Description

Overview

Discover techniques to create robust and maintainable data pipelines in this 22-minute Databricks talk. Learn why configurable pipelines are crucial, how to seamlessly promote them across environments, and reconfigure in production without recompiling. Explore the pros and cons of Databricks Notebook widgets, methods to externalize configurations, and leverage Scala features with pure config and typesafe libraries for boilerplate-free code. Gain insights on input validation, preventing data loss and corruption, and ensuring data correctness. Walk away with practical knowledge to enhance your data pipeline development and maintenance processes.

Syllabus

Intro
Why make your data pipelines dumb-proof?
How to make your data pipelines dumb-proof?
Fixing Hard coded Data Pipelines
Parameters & Input Validation
Externalizing Configuration
Configuration in JSON Format
Optimized Configuration in HOCON format
Readable and maintainable Configuration
Configuration Library
Refactor Code - Loading and Parsing Configuration
Boilerplate free configuration code
Sample Code
Summary


Taught by

Databricks

Related Courses

Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera
Data Analysis with Python
IBM via Coursera
Intro to TensorFlow 日本語版
Google Cloud via Coursera
TensorFlow on Google Cloud - Français
Google Cloud via Coursera
Freedom of Data with SAP Data Hub
SAP Learning