YoVDO

Building Modern Data Pipelines with Spark on Azure HDInsight

Offered By: PASS Data Community Summit via YouTube

Tags

PASS Data Community Summit Courses Data Visualization Courses Apache Spark Courses ETL Pipelines Courses Data Lineage Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore common patterns for building end-to-end data analytics pipelines using Apache Spark on Azure HDInsight in this conference talk from PASS Summit 2017. Dive into architecture examples, integration points, and various components of modern data pipelines. Learn about edge computing, notebooks, data visualization, and custom visualizations. Discover how to import data, use Hive context, and manage resources with Yarn. Examine batch and ETL pipelines, data sources, and pipeline tools like Azure Data Factory. Gain insights into real-time pipelines and data lineage considerations for building robust, scalable data solutions.

Syllabus

Introduction
Evaluations
Agenda
Architecture Example
Apache Spark
On Edge The Inside
Integration Points
EdgeInsight
Notebook
Notebook Extension
Data Visualization
Custom Visualizations
Importing Data
Hive Context
Yarn Resource Manager
Pause Spark Cluster
Batch Pipeline
ETL Pipeline
Data Sources
Visuals Considerations
Hive Tess
Data Lineage
Data Pipeline Tools
Demo Data Factory
Data Pipeline Options
RealTime Pipelines


Taught by

PASS Data Community Summit

Related Courses

Building Advanced Codeless Pipelines on Cloud Data Fusion
Google Cloud via Coursera
Principles for Data Quality Measures
Pluralsight
Manage workspaces and datasets in Power BI
Microsoft via Microsoft Learn
Building Advanced Codeless Pipelines on Cloud Data Fusion
Google via Qwiklabs
Exploring the Lineage of Data with Cloud Data Fusion
Google Cloud via Coursera