Declarative ETL Pipelines with Delta Live Tables - Modern Software Engineering for Data Analysts and Engineers
Offered By: SQLBits via YouTube
Course Description
Overview
Syllabus
Intro
What is a Streaming Live Table? Based on Spark⢠Structured Streaming
Development vs Production Fast iteration or enterprise grade reliability
Choosing pipeline boundaries Break up pipelines at natural external divisions.
Pitfall: hard-code sources & destinations Problem: Hard coding the source & destination makes it impossible to test changes outside of production, breaking CI/CD
Ensure correctness with Expectations Expectations are tests that ensure data quality in production
Expectations using the power of SQL Use SQL aggregates and joins to perform complex validations
Using Python Write advanced DataFrame code and UDFs
Installing libraries with pip pip is a package installer for python
Best Practice: Integrate using the event log Use the information in the event log with your existing operational tools.
DLT Automates Failure Recovery Transient issues are handled by built-in retry logic
Modularize your code with configuration Avoid hard coding paths, topic names, and other constants in your code.
Workflow Orchestration For Triggered DLT Pipelines
Use Delta for infinite retention Delta provides cheap, elastic and governable storage for transient sources
Taught by
SQLBits
Related Courses
Data Lakes for Big DataEdCast Distributed Computing with Spark SQL
University of California, Davis via Coursera Modernizing Data Lakes and Data Warehouses with Google Cloud
Google Cloud via Coursera Data Engineering with AWS
Udacity Preparing for Google Cloud Certification: Cloud Data Engineer
Google Cloud via Coursera