Data Science on Google Cloud Platform: Building Data Pipelines
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to design and build big data pipelines on Google Cloud Platform.
Syllabus
Introduction
- What goes into a data pipeline?
- Data science modules covered
- GCP data pipeline options
- Cloud Dataproc
- Cloud Dataflow
- Cloud Pub/Sub
- What is Apache Beam?
- Beam pipelines
- PCollections
- Transforms
- Pipeline I/O
- Runners
- Setting up GCP for Dataflow
- Setting up Python
- Creating a simple pipeline
- Executing in Dataflow
- Reading text files
- ParDo
- GroupBy
- Map
- Combine
- Writing data to text files
- Other capabilities
- What is Pub/Sub?
- Topics and messages
- Publishers
- Subscribers
- Create a topic
- Create a subscription
- Publish and receive
- Python SDK
- Streaming with Dataflow
- Windowing with Dataflow
- Streaming and windowing example
- Next steps
Taught by
Kumaran Ponnambalam
Related Courses
Hands-On with DataflowA Cloud Guru Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera Data Integration with Microsoft Azure Data Factory
Microsoft via Coursera Azure Data Factory : Implement SCD Type 1
Coursera Project Network via Coursera MLOps1 (Azure): Deploying AI & ML Models in Production using Microsoft Azure Machine Learning
statistics.com via edX