YoVDO

Reddit Data Pipeline Engineering with AWS - End-to-End Data Engineering

Offered By: CodeWithYu via YouTube

Tags

Data Engineering Courses Apache Airflow Courses PostgreSQL Courses Amazon Athena Courses Amazon S3 Courses AWS Glue Courses Amazon Redshift Courses ETL Courses Celery Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Embark on a comprehensive end-to-end data engineering journey, focusing on building a Reddit data pipeline using AWS services. Learn to extract data from Reddit's API, orchestrate ETL processes with Apache Airflow and Celery, and efficiently store data in Amazon S3. Discover how to leverage AWS Glue for data cataloging and ETL jobs, query and transform data using Amazon Athena, and set up a Redshift cluster for analytics. Gain insights into best practices for loading data into Amazon Redshift and explore data visualization techniques. Through hands-on demonstrations, master the integration of various tools and technologies to create a seamless ETL process, enhancing your skills in data pipeline engineering and AWS cloud services.

Syllabus

Introduction
Setting up Apache airflow with Celery Backend and Postgres
Reddit Data Pipeline with airflow
Cleaning and Transforming Reddit Data
Connecting to AWS from Airflow
AWS Glue data transformation
Querying Data with Athena
Setting up Redshift Data Warehouse
Redshift Data Warehouse Query Tool
Loading Data into Data Warehouse
Charting with Redshift Data Warehouse


Taught by

CodeWithYu

Related Courses

Building Batch Data Pipelines on GCP auf Deutsch
Google Cloud via Coursera
Building Batch Data Pipelines on GCP en Français
Google Cloud via Coursera
Mastering Azure Data Factory: From Basics to Advanced Level
Udemy
Data Science de A a Z - Extraçao e Exibição dos Dados
Udemy
Building Batch Data Processing Solutions in Microsoft Azure
Pluralsight