Reddit Data Pipeline Engineering with AWS - End-to-End Data Engineering
Offered By: CodeWithYu via YouTube
Course Description
Overview
Embark on a comprehensive end-to-end data engineering journey, focusing on building a Reddit data pipeline using AWS services. Learn to extract data from Reddit's API, orchestrate ETL processes with Apache Airflow and Celery, and efficiently store data in Amazon S3. Discover how to leverage AWS Glue for data cataloging and ETL jobs, query and transform data using Amazon Athena, and set up a Redshift cluster for analytics. Gain insights into best practices for loading data into Amazon Redshift and explore data visualization techniques. Through hands-on demonstrations, master the integration of various tools and technologies to create a seamless ETL process, enhancing your skills in data pipeline engineering and AWS cloud services.
Syllabus
Introduction
Setting up Apache airflow with Celery Backend and Postgres
Reddit Data Pipeline with airflow
Cleaning and Transforming Reddit Data
Connecting to AWS from Airflow
AWS Glue data transformation
Querying Data with Athena
Setting up Redshift Data Warehouse
Redshift Data Warehouse Query Tool
Loading Data into Data Warehouse
Charting with Redshift Data Warehouse
Taught by
CodeWithYu
Related Courses
Building Data Lakes on AWSAmazon Web Services via Coursera Analyzing Data on AWS
Pluralsight AnĂ¡lisis serverless de data en Amazon S3 usando Athena
Coursera Project Network via Coursera AWS Athena Tutorial with Hands on LAB | Serverless Querying
Udemy Getting Started with Data Analytics on AWS
Amazon Web Services via edX