YoVDO

Data Science on Google Cloud Platform: Building Data Pipelines

Offered By: LinkedIn Learning

Tags

Data Pipelines Courses Big Data Courses Python Courses Apache Beam Courses Data Processing Courses Cloud Pub/Sub Courses Streaming Data Courses

Course Description

Overview

Learn how to design and build big data pipelines on Google Cloud Platform.

Syllabus

Introduction
  • What goes into a data pipeline?
  • Data science modules covered
1. GCP Data Pipeline Products
  • GCP data pipeline options
  • Cloud Dataproc
  • Cloud Dataflow
  • Cloud Pub/Sub
2. Apache Beam
  • What is Apache Beam?
  • Beam pipelines
  • PCollections
  • Transforms
  • Pipeline I/O
  • Runners
3. Setting Up Dataflow
  • Setting up GCP for Dataflow
  • Setting up Python
  • Creating a simple pipeline
  • Executing in Dataflow
4. Data Processing with Beam and Dataflow
  • Reading text files
  • ParDo
  • GroupBy
  • Map
  • Combine
  • Writing data to text files
  • Other capabilities
5. Cloud Pub/Sub
  • What is Pub/Sub?
  • Topics and messages
  • Publishers
  • Subscribers
  • Create a topic
  • Create a subscription
  • Publish and receive
  • Python SDK
6. Streaming with Dataflow
  • Streaming with Dataflow
  • Windowing with Dataflow
  • Streaming and windowing example
Conclusion
  • Next steps

Taught by

Kumaran Ponnambalam

Related Courses

Web Intelligence and Big Data
Indian Institute of Technology Delhi via Coursera
Big Data for Better Performance
Open2Study
Big Data and Education
Columbia University via edX
Big Data Analytics in Healthcare
Georgia Institute of Technology via Udacity
Data Mining with Weka
University of Waikato via Independent