Advanced Data Engineering
Offered By: Pragmatic AI Labs via edX
Course Description
Overview
Master Scalable Data Engineering with Cutting-Edge Tools
- Learn to handle massive datasets efficiently with this advanced course
- Gain practical expertise in scaling data systems using modern technologies
- Ideal for data scientists, engineers & professionals with data handling experience
Course Highlights:
- Leverage Celery & RabbitMQ for scalable data consumption
- Optimize workflows with Apache Airflow for efficient management
- Utilize Vector & Graph databases for robust data management at scale
- Hands-on projects for real-world experience in solving data challenges
- Create scalable systems & analyze performance for optimum results
Upskill to design, build & optimize data engineering pipelines that can handle complex, large-scale datasets. Prepare for demanding data roles by mastering advanced techniques with this comprehensive training.
Syllabus
Module 1: Queues and Databases-RabbitMQ and MySQL (6 hours)
\\- Video: Meet your instructor: Alfredo Deza (1 minute, Preview module)
\\- Video: About this course (2 minutes)
\\- Reading: Connect with your instructor (10 minutes)
\\- Reading: Meet your instructor: Noah Gift (10 minutes)
\\- Reading: Course structure and discussion etiquette (10 minutes)
\\- Video: Introduction (1 minute)
\\- Video: Overview of Queues (5 minutes)
\\- Video: What is Celery? (3 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Introduction to Celery (10 minutes)
\\- Video: Use cases for RabbitMQ (3 minutes)
\\- Reading: Using RabbitMQ with Docker (10 minutes)
\\- Reading: External lab: Start RabbitMQ in a development environment (10 minutes)
\\- Video: Overview of a Flask and Celery application (3 minutes)
\\- Video: Summary (1 minute)
\\- Quiz: Introduction to RabbitMQ and Flask (30 minutes)
\\- Video: Introduction (0 minutes)
\\- Video: Configuring Celery with Flask (4 minutes)
\\- Video: Connecting Celery with RabbitMQ (5 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Build a web app by using Python and Flask (10 minutes)
\\- Reading: Background tasks with Celery (10 minutes)
\\- Video: Defining a Celery task in Flask (3 minutes)
\\- Video: Fire and forget task in Flask (2 minutes)
\\- Video: Retrieve values from asynchronous tasks (3 minutes)
\\- Reading: External lab: Add a new Celery task for RabbitMQ (10 minutes)
\\- Video: Summary (1 minute)
\\- Quiz: RabbitMQ with Celery and Flask (30 minutes)
\\- Video: MySQL Overview (2 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Getting Started with MySQL (10 minutes)
\\- Video: MySQL from Terminal (3 minutes)
\\- Video: Archive and Drop Database (5 minutes)
\\- Video: Import external database Sakila (7 minutes)
\\- Video: Modify database Sakila (4 minutes)
\\- Video: Bash pipelines with MySQL (5 minutes)
\\- Video: MySQL to Python Standard Library Web Server (4 minutes)
\\- Ungraded Lab: Linux Hacking with MySQL (60 minutes)
\\- Quiz: Quiz-MySQL for Data Engineering (30 minutes)
\\- Reading: Lesson Reflection (10 minutes)
\\- Discussion Prompt: Meet and greet (optional) (10 minutes)
\\- Quiz: Queues and Databases - Final week quiz (30 minutes)
****
Module 2: Optimizing Workflow Management at Scale with Apache Airflow (5 hours)
- Video: Introduction (1 minute, Preview module)
- Video: What is Apache Airflow? (6 minutes)
- Reading: Key Terms (10 minutes)
- Reading: What is Apache Airflow (10 minutes)
- Video: Installing Apache Airflow from PyPI (5 minutes)
- Video: Using Apache Airflow with Docker (6 minutes)
- Reading: Exploring the Airflow User Interface (10 minutes)
- Reading: External lab: Install Apache Airflow (10 minutes)
- Video: Exploring the Airflow UI (6 minutes)
- Quiz: Quiz-Installing Apache Airflow (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Introduction (0 minutes)
- Video: Exploring directed acyclic graphs (DAG) (10 minutes)
- Reading: Key Terms (10 minutes)
- Reading: External lab: Create a DAG (10 minutes)
- Video: Creating a DAG (7 minutes)
- Video: Running a backfill (4 minutes)
- Reading: Architecture overview (10 minutes)
- Video: Testing and validation (7 minutes)
- Video: Summary (0 minutes)
- Quiz: Quiz-Apache Airflow Fundamentals (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Introduction (1 minute)
- Video: Identifying a task to build a DAG (4 minutes)
- Reading: Key Terms (10 minutes)
- Reading: External Lab: Build a data pipeline for census data (10 minutes)
- Video: Retrieving remote data (4 minutes)
- Video: Cleaning and normalizing data (4 minutes)
- Video: Inspecting the UI for results (4 minutes)
- Reading: Build Data Pipelines with Apache Airflow (10 minutes)
- Video: Summary (1 minute)
- Reading: Lesson Reflection (10 minutes)
- Quiz: Quiz-Creating a pipeline (30 minutes)
- Quiz: Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow (30 minutes)
****
Module 3: Achieving Scalability with Vector, Graph, and Key/Value Databases (5 hours)
- Video: Picking the proper database (3 minutes, Preview module)
- Video: What are vector databases and how they work (2 minutes)
- Reading: Key Terms (10 minutes)
- Reading: What is a Vector Database? (10 minutes)
- Video: Implementing Semantic search (4 minutes)
- Video: Quickstart Qdrant (3 minutes)
- Reading: External Lab: Run Quickstart of Qdrant (10 minutes)
- Video: Qdrant Rust Client (3 minutes)
- Reading: External Lab: Extend Semantic Search (10 minutes)
- Video: Vector Database Architectures (2 minutes)
- Video: Hands-on lab: Enhance Semantic Search (3 minutes)
- Reading: Jaccard index (10 minutes)
- Quiz: Quiz-Introduction to Vector Databases (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Graph data models and database concepts (2 minutes)
- Reading: Key Terms (10 minutes)
- Reading: Rust CLI with Clap (10 minutes)
- Video: Introduction to Amazon Neptune (2 minutes)
- Reading: External Lab: Rust Graph CLI Tool (10 minutes)
- Video: Graph algorithms: UFC graph centrality in Rust (4 minutes)
- Video: Kosaraju Community Detection in Graphs (4 minutes)
- Video: Shortest Path with Graphs (3 minutes)
- Reading: Amazon Neptune (10 minutes)
- Video: Key Components of Rust CLI Tool (1 minute)
- Video: Lab Walkthrough: Building a Rust Graph CLI Tool (2 minutes)
- Quiz: Quiz-Introduction to Graph Databases (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Quiz: Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases (30 minutes)
- Ungraded Lab: Social Media Recommender (60 minutes)
****
Module 4: Real-world Advanced Data Engineering Projects (5 hours)
- Video: Learn AWS CloudShell for Dynamo Development (4 minutes, Preview module)
- Video: Learn AWS CodeCatalyst for Dynamo Development (5 minutes)
- Reading: Key Terms (10 minutes)
- Reading: Amazon CodeCatalyst (10 minutes)
- Video: Leveraging AWS CodeWhisperer for Dynamo Development (4 minutes)
- Video: Create a Table with CLI (1 minute)
- Video: Populate a Table With Batching Records (1 minute)
- Video: Query a Table with Records (2 minutes)
- Reading: External Lab: Extended DynamoDB (10 minutes)
- Video: Project Walkthrough (2 minutes)
- Quiz: Quiz-Building a solution with DynamoDB with the AWS CLI (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Video: Introduction (1 minute)
- Video: Overview of a pipeline requirements (3 minutes)
- Reading: Key Terms (10 minutes)
- Reading: Quick start for SQLAlchemy (10 minutes)
- Video: Using SqlAlchemy with Pandas (6 minutes)
- Reading: Explore and analyze data with Python (10 minutes)
- Video: Persisting data in a task (6 minutes)
- Video: Reviewing the results (4 minutes)
- Video: Summary (1 minute)
- Quiz: Quiz-Persisting data through a multi-task DAG with Pandas (30 minutes)
- Reading: Lesson Reflection (10 minutes)
- Reading: Recommended Next Steps (10 minutes)
- Quiz: Final Quiz-Advanced Data Engineering (30 minutes)
- Ungraded Lab: Jupyter Sandbox (60 minutes)
- Ungraded Lab: VS Code Sandbox (60 minutes)
Taught by
Alfredo Deza and Noah Gift
Related Courses
4.0 Shades of Digitalisation for the Chemical and Process IndustriesUniversity of Padova via FutureLearn A Beginner’s Guide to Data Handling and Management in Excel
Packt via FutureLearn A Day in the Life of a Data Engineer (Korean)
Amazon Web Services via AWS Skill Builder A Cloud Guru's Elastic Certified Engineer Exam Preparation Course
A Cloud Guru Azure Cosmos DB Deep Dive
A Cloud Guru