Data Science on Google Cloud Platform: Building Data Pipelines
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to design and build big data pipelines on Google Cloud Platform.
Syllabus
Introduction
- What goes into a data pipeline?
- Data science modules covered
- GCP data pipeline options
- Cloud Dataproc
- Cloud Dataflow
- Cloud Pub/Sub
- What is Apache Beam?
- Beam pipelines
- PCollections
- Transforms
- Pipeline I/O
- Runners
- Setting up GCP for Dataflow
- Setting up Python
- Creating a simple pipeline
- Executing in Dataflow
- Reading text files
- ParDo
- GroupBy
- Map
- Combine
- Writing data to text files
- Other capabilities
- What is Pub/Sub?
- Topics and messages
- Publishers
- Subscribers
- Create a topic
- Create a subscription
- Publish and receive
- Python SDK
- Streaming with Dataflow
- Windowing with Dataflow
- Streaming and windowing example
- Next steps
Taught by
Kumaran Ponnambalam
Related Courses
Coding the Matrix: Linear Algebra through Computer Science ApplicationsBrown University via Coursera كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق) Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS Data Lakes for Big Data
EdCast 統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco