Data Science on Google Cloud Platform: Building Data Pipelines
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to design and build big data pipelines on Google Cloud Platform.
Syllabus
Introduction
- What goes into a data pipeline?
- Data science modules covered
- GCP data pipeline options
- Cloud Dataproc
- Cloud Dataflow
- Cloud Pub/Sub
- What is Apache Beam?
- Beam pipelines
- PCollections
- Transforms
- Pipeline I/O
- Runners
- Setting up GCP for Dataflow
- Setting up Python
- Creating a simple pipeline
- Executing in Dataflow
- Reading text files
- ParDo
- GroupBy
- Map
- Combine
- Writing data to text files
- Other capabilities
- What is Pub/Sub?
- Topics and messages
- Publishers
- Subscribers
- Create a topic
- Create a subscription
- Publish and receive
- Python SDK
- Streaming with Dataflow
- Windowing with Dataflow
- Streaming and windowing example
- Next steps
Taught by
Kumaran Ponnambalam
Related Courses
Google Cloud Big Data and Machine Learning Fundamentals en EspañolGoogle Cloud via Coursera Data Analysis with Python
IBM via Coursera Intro to TensorFlow 日本語版
Google Cloud via Coursera TensorFlow on Google Cloud - Français
Google Cloud via Coursera Freedom of Data with SAP Data Hub
SAP Learning