Introduction to Data Engineering on Google Cloud
Offered By: Google Cloud via Coursera
Course Description
Overview
In this course, you learn about data engineering on Google Cloud, the roles and responsibilities of data engineers, and how those map to offerings provided by Google Cloud. You also learn about ways to address data engineering challenges.
Syllabus
- Course Introduction
- This section welcomes you to the Introduction to Data Engineering on Google Cloud course, and provides an overview of the course structure and goals.
- Data Engineering Tasks and Components
- This module provides an introduction to the role of a data engineer. It covers key concepts such as data sources and sinks, data formats, storage options on Google Cloud, metadata management, and the use of Analytics Hub for data sharing within and outside an organization.
- Data Replication and Migration
- This module provides an overview of data replication and migration on Google Cloud. It covers the basic architecture, the 'gcloud' command-line tool, Storage Transfer Service, Transfer Appliance, and Datastream, along with their functionalities and use cases.
- The Extract and Load Data Pipeline Pattern
- This module focuses on data extraction and loading processes on Google Cloud, particularly with BigQuery. It covers the basic extraction and loading architecture, the bq command-line tool, BigQuery Data Transfer Service, and BigLake as an alternative to traditional extract-load patterns.
- The Extract, Load, and Transform Data Pipeline Pattern
- This module provides an overview of ELT (extract, load, transform) processes on Google Cloud. It covers the basic ELT architecture, a common ELT pipeline example, BigQuery's capabilities for scripting and scheduling SQL, and the functionality and use cases of Dataform.
- The Extract, Transform, and Load Data Pipeline Pattern
- This module provides an overview of ETL (extract, transform, load) processes on Google Cloud. It covers the basic ETL architecture, GUI tools, batch and streaming data processing options (Dataproc, Dataproc Serverless), and the role of Bigtable in data pipelines.
- Automation Techniques
- This module focuses on automation patterns and options for pipelines on Google Cloud. It covers various tools and services like Cloud Scheduler, Workflows, Cloud Composer, Cloud Run functions, and Eventarc, along with their functionalities and use cases for automation.
- Course Summary
- In this final section, we review what was presented in this course and discuss the next steps to continue your cloud learning journey.
Taught by
Google Cloud Training
Related Courses
Serverless Data Analysis with Google BigQuery and Cloud Dataflow en FrançaisGoogle Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera Industrial IoT on Google Cloud
Google Cloud via Coursera Google Cloud Platform Big Data and Machine Learning Fundamentals em Português Brasileiro
Google Cloud via Coursera