Data Engineering Pipeline Management with Apache Airflow
Offered By: LinkedIn Learning
Course Description
Overview
Explore ways to work with role-based access control, manage SLAs, schedule DAGs with datasets, work with Airflow Plugins, and scale Airflow.
Syllabus
Introduction
- Features for data engineering pipeline management
- Prerequisites
- Quick install overview
- Creating an admin user and exploring roles
- Creating users with different roles
- Executing a simple branching DAG
- Executing a simple SQL DAG
- The public and viewer roles
- The user role
- The op role
- Actions, resources, and permissions
- Adding permissions to the public role
- Creating and configuring a custom role
- Configuring emails for SLA management
- Configuring task-level SLAs
- Triggering and viewing SLA misses
- Configuring DAG-level SLAs
- Configuring DAG failed action
- Dataset producer pipeline
- Dataset consumer pipeline
- Data-aware scheduling
- Purchases producer pipeline and join pipeline
- Data-aware scheduling with multiple datasets
- Introducing plugins
- Adding menu items using plugins
- Exploring the CSV reader plugin
- Implementing the CSV reader plugin
- Scaling Apache Airflow
- Basic setup for the transformation pipeline
- DAG for the transformation pipeline
- Install RabbitMQ on macOS and Linux
- Set up an admin user for RabbitMQ
- Configuring the CeleryExecutor for Airflow
- Executing tasks on a single Celery worker
- Executing tasks on multiple Celery workers
- Assigning tasks to queues
- Summary and next steps
Taught by
Janani Ravi
Related Courses
Microsoft Exchange Server 2016 - 1: InfrastructureMicrosoft via edX Access Controls
(ISC)² via Coursera Cloud Volumes ONTAP Deployment and Management for Azure
NetApp via edX Architecting with Google Kubernetes Engine
Google Cloud via Coursera Architecting with Google Kubernetes Engine 日本語版
Google Cloud via Coursera