YoVDO

Source Systems, Data Ingestion, and Pipelines

Offered By: DeepLearning.AI via Coursera

Tags

Amazon Web Services (AWS) Courses Terraform Courses Stream Processing Courses Batch Processing Courses Data Pipelines Courses Data Ingestion Courses ETL Courses DataOps Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
In this course, you will explore various types of source systems, learn how they generate and update data, and troubleshoot common issues you might encounter when trying to connect to these systems in the real world. You’ll dive into the details of common ingestion patterns and implement batch and streaming pipelines. You’ll automate and orchestrate your data pipelines using infrastructure as code and pipelines as code tools. You’ll also explore AWS and open source tools for monitoring your data systems and data quality.

Syllabus

  • Working with Source Systems
    • In lesson 1, you will explore source systems data engineers commonly interact with. Then in lesson 2, you will learn how to connect to various source systems and troubleshoot common connectivity issues.
  • Data Ingestion
    • This week you will dive deep into the batch and streaming ingestion patterns. You will identify use cases and considerations for each, and then build a batch and a streaming ingestion pipeline. When looking at batch ingestion, you will compare and contrast the ETL and ELT paradigms. You will also explore various AWS services for batch and streaming ingestion.
  • DataOps
    • In the first lesson, you will explore DataOps automation practices, including applying CI/CD to both data and code, and using infrastructure as code tools like Terraform to automate the provisioning and management of your resources. Then in lesson 2, you will explore DataOps observability and monitoring practices, including using tools like Great Expectation to monitor data quality, and using Amazon CloudWatch to monitor your infrastructure.
  • Orchestration, Monitoring, and Automating Your Data Pipelines
    • This week, you will learn all about orchestrating your data pipeline tasks. You'll identify the various orchestration tools, but will focus on Airflow -- one of the most popular and widely used tools in the field today. You'll explore the core components of Airflow, the Airflow UI, and how to create and manage DAGs using various Airflow features.

Taught by

Joe Reis

Tags

Related Courses

Terraform Basics: Automate Provisioning of AWS EC2 Instances
Coursera Project Network via Coursera
DevOps CI/CD Pipeline: Automation from development to deployment
Universidad Anáhuac via edX
DevOps Pipeline: Automatización hasta el despliegue
Universidad Anáhuac via edX
DevOps Foundations: Software Development Optimization
Universidad Anáhuac via edX
Fundamentos de DevOps: Optimiza el desarrollo del software
Universidad Anáhuac via edX