Web Applications and Command-Line Tools for Data Engineering
Offered By: Pragmatic AI Labs via edX
Course Description
Overview
In this practical course, you'll gain essential skills for modern data engineering:
- Build interactive Jupyter notebooks for data analysis and machine learning
- Deploy notebooks on cloud platforms like Google Colab and AWS SageMaker
- Construct scalable Python microservices using FastAPI
- Containerize and deploy machine learning microservices
- Create robust command-line tools in Python and Rust
- Automate testing and publishing of your data engineering projects
Whether you're a data engineer, scientist, or analyst, this course will level up your abilities to build powerful data solutions. Get hands-on experience with cutting-edge tools and techniques you can apply on the job.
Syllabus
Here is the course structure formatted with bullets for each module:
Module 1: Jupyter Notebooks (4 hours)
\- Introduction to web applications and command-line tools for data engineering
\- Overview of key concepts
\- Getting started with Jupyter notebooks
\- Code cells and text cells in Jupyter
\- Magics in Jupyter
\- Overview of Jupyter Lab
Module 2: Cloud-Hosted Notebooks (5 hours)
\- Introduction to Google Colab
\- Tour of Colab features
\- Data and documents in Colab
\- Introduction to AWS SageMaker
\- Tour of SageMaker Studio
\- Overview of SageMaker Pipelines
Module 3: Python Microservices (12 hours)
\- Introduction to building Python microservices
\- Benefits of microservices
\- Setting up Python project structure for CI
\- Building a random fruit web app with Python
\- Introduction to Python microservices with FastAPI
\- Building FastAPI microservices for ML predictions
\- Deploying a Python Lambda microservice
\- Introduction to building containerized microservices
\- Why use containers for microservices?
\- Deploying a containerized .NET 6 API
\- Deploying a containerized ML microservice
Module 4: Python Packaging and Rust Command-Line Tools (19 hours)
\- Introduction to Python packaging and command-line tools
\- Getting started with Python projects
\- Overview of command-line tool frameworks
\- Using Click to build a command-line tool
\- Exploring advanced command-line tool features
\- Introduction to packaging and distributing your Python project
\- Working with Python setup tools
\- Uploading to a Python registry
\- Introduction to continuous integration for command-line tools
\- Automating testing and publishing with GitHub Actions
\- Introduction to Rust command-line tools
\- Working with user input, output, modules in Rust
\- Optimizing Rust command-line tools
\- Big O notation final challenge
Taught by
Noah Gift, Alfredo Deza and Kennedy Behrman
Related Courses
内存数据库管理openHPI CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX Processing Big Data with Azure Data Lake Analytics
Microsoft via edX Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera