YoVDO

Data Science on Google Cloud Platform: Building Data Pipelines

Offered By: LinkedIn Learning

Tags

Data Pipelines Courses Big Data Courses Python Courses Apache Beam Courses Data Processing Courses Cloud Pub/Sub Courses Streaming Data Courses

Course Description

Overview

Learn how to design and build big data pipelines on Google Cloud Platform.

Syllabus

Introduction
  • What goes into a data pipeline?
  • Data science modules covered
1. GCP Data Pipeline Products
  • GCP data pipeline options
  • Cloud Dataproc
  • Cloud Dataflow
  • Cloud Pub/Sub
2. Apache Beam
  • What is Apache Beam?
  • Beam pipelines
  • PCollections
  • Transforms
  • Pipeline I/O
  • Runners
3. Setting Up Dataflow
  • Setting up GCP for Dataflow
  • Setting up Python
  • Creating a simple pipeline
  • Executing in Dataflow
4. Data Processing with Beam and Dataflow
  • Reading text files
  • ParDo
  • GroupBy
  • Map
  • Combine
  • Writing data to text files
  • Other capabilities
5. Cloud Pub/Sub
  • What is Pub/Sub?
  • Topics and messages
  • Publishers
  • Subscribers
  • Create a topic
  • Create a subscription
  • Publish and receive
  • Python SDK
6. Streaming with Dataflow
  • Streaming with Dataflow
  • Windowing with Dataflow
  • Streaming and windowing example
Conclusion
  • Next steps

Taught by

Kumaran Ponnambalam

Related Courses

Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera
Data Analysis with Python
IBM via Coursera
Intro to TensorFlow 日本語版
Google Cloud via Coursera
TensorFlow on Google Cloud - Français
Google Cloud via Coursera
Freedom of Data with SAP Data Hub
SAP Learning