YoVDO

DeepLearning.AI Data Engineering

Offered By: DeepLearning.AI via Coursera

Tags

Amazon Web Services (AWS) Courses Data Storage Courses Data Transformation Courses Data Modeling Courses Data Engineering Courses Data Pipelines Courses Data Ingestion Courses ETL Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
The DeepLearning.AI Data Engineering Professional Certificate is a comprehensive online program for data engineers and practitioners looking to start or grow their careers. Organizations of all sizes and across all industries are capturing and generating data at an ever-increasing pace. Within these organizations, every team, from executives, sales and marketing, finance and operations, product and engineering, to customer service, can derive insights and value from organizational data. Whether the end use case is data science, machine learning, or analytics, data engineering is what allows raw data to be converted to value for the business. This is why the role of data engineer is one of the highest-demand jobs in tech today. Throughout this program, you'll learn the foundations of data engineering while gaining hands-on experience designing and implementing data architectures using AWS and open-source tools. Taught by industry expert Joe Reis, co-author of Fundamentals of Data Engineering, this certificate equips you with the skills and knowledge to excel in a high-demand field, focusing on ingesting, processing, transforming, storing, and serving data to data stakeholders to drive organizational and business objectives. The practical labs were developed in partnership with AWS and Factored.AI to provide you with an authentic experience building data systems on the cloud. With this certificate, you will have the tools to further your data engineering career.

Syllabus

Course 1: Introduction to Data Engineering
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will be introduced to the data engineering lifecycle, from data ... Enroll for free.

Course 2: Source Systems, Data Ingestion, and Pipelines
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will explore various types of source systems, learn how they ... Enroll for free.

Course 3: Data Storage and Queries
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will learn about the raw ingredients and processes that are used to ... Enroll for free.

Course 4: Data Modeling, Transformation, and Serving
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you’ll model, transform, and serve data for both analytics and machine ... Enroll for free.


Courses

  • 0 reviews

    1 day 3 hours 17 minutes

    View details
    In this course, you’ll model, transform, and serve data for both analytics and machine learning use cases. You’ll explore various data modeling techniques for batch analytics, including normalization, star schema, data vault, and one big table, and you’ll use dbt to transform a dataset based on a star schema and one big table. You’ll also compare the Inmon vs Kimball data modeling approaches for data warehouses. You’ll model and transform a tabular dataset for machine learning purposes. You’ll also model and transform unstructured image and textual data. You’ll explore distributed processing frameworks such as Hadoop MapReduce and Spark, and perform stream processing. You’ll identify different ways of serving data for analytics and machine learning, including using views and materialized views, and you’ll describe how a semantic layer built on top of your data model can support the business. In the last week of this course, you’ll complete a capstone project where you’ll build an end-to-end data pipeline that encompasses all of the stages of the data engineering lifecycle to serve data that provides business value.
  • 0 reviews

    22 hours 19 minutes

    View details
    In this course, you will learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different storage systems, including object, block, and file storage, as well as databases, that are built on top of these raw ingredients. You’ll also get a chance to use the Cypher language to query a Neo4j graph database, and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses, to data lakes, and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm. With hands-on practice, you will design a simple data lake using Amazon Glue, and build a data lakehouse using AWS LakeFormation and Apache Iceberg. In the last week of this course, you’ll see how queries work behind the scenes, practice writing more advanced SQL queries, compare the query performance in row vs column-oriented storage, and perform streaming queries using Apache Flink.
  • 0 reviews

    17 hours 38 minutes

    View details
    In this course, you will be introduced to the data engineering lifecycle, from data generation in source systems, to ingestion, transformation, storage, and serving data to downstream stakeholders. You’ll study the key undercurrents that affect all stages of the lifecycle, and start developing a framework for how to think like a data engineer. To gain hands-on practice, you’ll gather stakeholder needs, translate those needs into system requirements, and choose tools and technologies to build systems that provide business value. By the end of this course you’ll be spinning up batch and streaming data pipelines to serve product recommendations on the AWS cloud!
  • 0 reviews

    1 day 10 hours 19 minutes

    View details
    In this course, you will explore various types of source systems, learn how they generate and update data, and troubleshoot common issues you might encounter when trying to connect to these systems in the real world. You’ll dive into the details of common ingestion patterns and implement batch and streaming pipelines. You’ll automate and orchestrate your data pipelines using infrastructure as code and pipelines as code tools. You’ll also explore AWS and open source tools for monitoring your data systems and data quality.

Taught by

Joe Reis

Related Courses

Communicating Data Science Results
University of Washington via Coursera
Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud
University of Illinois at Urbana-Champaign via Coursera
Cloud Computing Infrastructure
University System of Maryland via edX
Google Cloud Platform for AWS Professionals
Google via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera