YoVDO

Data Science on Google Cloud Platform: Building Data Pipelines

Offered By: LinkedIn Learning

Tags

Data Pipelines Courses Big Data Courses Python Courses Apache Beam Courses Data Processing Courses Cloud Pub/Sub Courses Streaming Data Courses

Course Description

Overview

Learn how to design and build big data pipelines on Google Cloud Platform.

Syllabus

Introduction
  • What goes into a data pipeline?
  • Data science modules covered
1. GCP Data Pipeline Products
  • GCP data pipeline options
  • Cloud Dataproc
  • Cloud Dataflow
  • Cloud Pub/Sub
2. Apache Beam
  • What is Apache Beam?
  • Beam pipelines
  • PCollections
  • Transforms
  • Pipeline I/O
  • Runners
3. Setting Up Dataflow
  • Setting up GCP for Dataflow
  • Setting up Python
  • Creating a simple pipeline
  • Executing in Dataflow
4. Data Processing with Beam and Dataflow
  • Reading text files
  • ParDo
  • GroupBy
  • Map
  • Combine
  • Writing data to text files
  • Other capabilities
5. Cloud Pub/Sub
  • What is Pub/Sub?
  • Topics and messages
  • Publishers
  • Subscribers
  • Create a topic
  • Create a subscription
  • Publish and receive
  • Python SDK
6. Streaming with Dataflow
  • Streaming with Dataflow
  • Windowing with Dataflow
  • Streaming and windowing example
Conclusion
  • Next steps

Taught by

Kumaran Ponnambalam

Related Courses

Hands-On with Dataflow
A Cloud Guru
Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera
Data Integration with Microsoft Azure Data Factory
Microsoft via Coursera
Azure Data Factory : Implement SCD Type 1
Coursera Project Network via Coursera
MLOps1 (Azure): Deploying AI & ML Models in Production using Microsoft Azure Machine Learning
statistics.com via edX