Modern Data Orchestration: Best Practices and Real-World Use Cases
Offered By: The ASF via YouTube
Course Description
Overview
Explore advanced techniques and best practices for elevating your data pipeline game in this practical talk. Dive into real-world use cases, examining patterns for data pipelines using Airflow with Spark, DBT, and Polars. Learn strategies to avoid dependencies management in Airflow and reuse DAG templates across your organization. Delve into fundamental concepts of data pipelines, including data lineage, observability, metadata, quality, and auditing, and discover how to integrate these elements effectively. Master the art of writing clean code for data pipelines using the Factory Design Pattern with spark-submit, Airflow, and KubernatesPodOperator. Gain insights into Airflow alternatives like Dagster and Mage for your data architecture. Led by Riccardo Amadio, a Senior Data Engineer at Agile Lab, this 26-minute presentation offers a no-nonsense approach to modern data orchestration.
Syllabus
Modern Data Orchestrators
Taught by
The ASF
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera