Data Science on Google Cloud Platform: Building Data Pipelines
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to design and build big data pipelines on Google Cloud Platform.
Syllabus
Introduction
- What goes into a data pipeline?
- Data science modules covered
- GCP data pipeline options
- Cloud Dataproc
- Cloud Dataflow
- Cloud Pub/Sub
- What is Apache Beam?
- Beam pipelines
- PCollections
- Transforms
- Pipeline I/O
- Runners
- Setting up GCP for Dataflow
- Setting up Python
- Creating a simple pipeline
- Executing in Dataflow
- Reading text files
- ParDo
- GroupBy
- Map
- Combine
- Writing data to text files
- Other capabilities
- What is Pub/Sub?
- Topics and messages
- Publishers
- Subscribers
- Create a topic
- Create a subscription
- Publish and receive
- Python SDK
- Streaming with Dataflow
- Windowing with Dataflow
- Streaming and windowing example
- Next steps
Taught by
Kumaran Ponnambalam
Related Courses
Web Intelligence and Big DataIndian Institute of Technology Delhi via Coursera Big Data for Better Performance
Open2Study Big Data and Education
Columbia University via edX Big Data Analytics in Healthcare
Georgia Institute of Technology via Udacity Data Mining with Weka
University of Waikato via Independent