YoVDO

Automate Data Pipelines

Offered By: Udacity

Tags

Big Data Courses Amazon Web Services (AWS) Courses Data Pipelines Courses Data Lineage Courses

Course Description

Overview

In this course, you'll build pipelines leveraging Airflow DAGs to organize your tasks along with AWS resources such as S3 and Redshift.

Syllabus

  • Introduction to Automating Data Pipelines
    • Welcome to Automating Data Pipelines. In this lesson, you'll be introduced to the topic, prerequisites for the course, and the environment and tools you'll be using to build data pipelines.
  • Data Pipelines
    • In this lesson, you'll learn about the components of a data pipeline including Directed Acyclic Graphs (DAGs). You'll practice creating data pipelines with DAGs and Apache Airflow
  • Airflow and AWS
    • This lesson creates connections between Airflow and AWS first by creating credentials, then copying S3 data, leveraging connections and hooks, and building S3 data to the Redshift DAG.
  • Data Quality
    • Students will learn how to track data lineage and set up data pipeline schedules, partition data to optimize pipelines, investigating Data Quality issues, and write tests to ensure data quality.
  • Production Data Pipelines
    • In this last lesson, students will learn how to build Pipelines with maintainability and reusability in mind. They will also learn about pipeline monitoring.
  • Data Pipelines
    • Students work on a music streaming company’s data infrastructure by creating and automating a set of data pipelines with Airflow, monitoring and debugging production pipelines

Taught by

Sean Murdock

Related Courses

Building Advanced Codeless Pipelines on Cloud Data Fusion
Google Cloud via Coursera
Data Lake Modernization on Google Cloud: Cloud Data Fusion
Google via Google Cloud Skills Boost
Introduction to Data Quality
DataCamp
Exploring the Lineage of Data with Cloud Data Fusion
Google via Google Cloud Skills Boost
Building Codeless Pipelines on Cloud Data Fusion
Google via Google Cloud Skills Boost