Reddit Data Pipeline Engineering with AWS - End-to-End Data Engineering
Offered By: CodeWithYu via YouTube
Course Description
Overview
Embark on a comprehensive end-to-end data engineering journey, focusing on building a Reddit data pipeline using AWS services. Learn to extract data from Reddit's API, orchestrate ETL processes with Apache Airflow and Celery, and efficiently store data in Amazon S3. Discover how to leverage AWS Glue for data cataloging and ETL jobs, query and transform data using Amazon Athena, and set up a Redshift cluster for analytics. Gain insights into best practices for loading data into Amazon Redshift and explore data visualization techniques. Through hands-on demonstrations, master the integration of various tools and technologies to create a seamless ETL process, enhancing your skills in data pipeline engineering and AWS cloud services.
Syllabus
Introduction
Setting up Apache airflow with Celery Backend and Postgres
Reddit Data Pipeline with airflow
Cleaning and Transforming Reddit Data
Connecting to AWS from Airflow
AWS Glue data transformation
Querying Data with Athena
Setting up Redshift Data Warehouse
Redshift Data Warehouse Query Tool
Loading Data into Data Warehouse
Charting with Redshift Data Warehouse
Taught by
CodeWithYu
Related Courses
Building Batch Data Pipelines on GCP auf DeutschGoogle Cloud via Coursera Building Batch Data Pipelines on GCP en Français
Google Cloud via Coursera Mastering Azure Data Factory: From Basics to Advanced Level
Udemy Data Science de A a Z - Extraçao e Exibição dos Dados
Udemy Building Batch Data Processing Solutions in Microsoft Azure
Pluralsight