YoVDO

Data Science on Google Cloud Platform: Building Data Pipelines

Offered By: LinkedIn Learning

Tags

Data Pipelines Courses Big Data Courses Python Courses Apache Beam Courses Data Processing Courses Cloud Pub/Sub Courses Streaming Data Courses

Course Description

Overview

Learn how to design and build big data pipelines on Google Cloud Platform.

Syllabus

Introduction
  • What goes into a data pipeline?
  • Data science modules covered
1. GCP Data Pipeline Products
  • GCP data pipeline options
  • Cloud Dataproc
  • Cloud Dataflow
  • Cloud Pub/Sub
2. Apache Beam
  • What is Apache Beam?
  • Beam pipelines
  • PCollections
  • Transforms
  • Pipeline I/O
  • Runners
3. Setting Up Dataflow
  • Setting up GCP for Dataflow
  • Setting up Python
  • Creating a simple pipeline
  • Executing in Dataflow
4. Data Processing with Beam and Dataflow
  • Reading text files
  • ParDo
  • GroupBy
  • Map
  • Combine
  • Writing data to text files
  • Other capabilities
5. Cloud Pub/Sub
  • What is Pub/Sub?
  • Topics and messages
  • Publishers
  • Subscribers
  • Create a topic
  • Create a subscription
  • Publish and receive
  • Python SDK
6. Streaming with Dataflow
  • Streaming with Dataflow
  • Windowing with Dataflow
  • Streaming and windowing example
Conclusion
  • Next steps

Taught by

Kumaran Ponnambalam

Related Courses

Introduction to Google Cloud
A Cloud Guru
App Dev: Developing a Backend Service - Python
Google via Google Cloud Skills Boost
Building Resilient Streaming Systems on Google Cloud Platform en Français
Google Cloud via Coursera
Cloud Scheduler: Qwik Start
Google via Google Cloud Skills Boost
Continuous Delivery Pipelines with Spinnaker and Kubernetes Engine
Google via Google Cloud Skills Boost