YoVDO

Faster Data Integration Pipeline Execution Using Spark-Jobserver

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Data Visualization Courses REST APIs Courses Data Integration Courses Kerberos Courses ETL Courses

Course Description

Overview

Explore a 32-minute conference talk from Databricks on leveraging Spark-Jobserver to enhance data integration pipeline execution. Learn how Informatica utilizes Spark-Jobserver's capabilities to solve data visualization challenges for hierarchical data in Big Data pipelines. Discover the benefits of Spark context reuse for faster task execution, integration techniques using REST APIs, and strategies for managing parallel job execution and monitoring. Gain insights into configuring Spark-Jobserver with YARN cluster mode, handling secure SSL-enabled clusters, and managing multiple Spark-Jobserver instances. Delve into topics such as concurrent job execution, dependency resolution, and the journey of adopting Spark-Jobserver in a data integration product.

Syllabus

Intro
Informatica ETL Pipeline
Dealing with buggy pipelines
Data Preview - Feature Requirements
What spark-submit based data preview achieved?
Execution Profiling Results - Spark-submit
Compare Spark-submit with Spark Job Server
Spark-submit based Architecture
SJS based Architecture
Execution Flow
Spark Job Server vs Spark-submit
Setup Details
Getting started
Environment Variables (local.sh. template)
Application Code Migration
WordCount Example
Running Jobs
Handling Job Dependencies
Multiple Spark Job Servers
Concurrency
Support for Kerberos
HTTPS/SSL Enabled Server
Logging
Key Takeaways
Timeouts (in local.conf. template)
Complex Data Representation in Informatica Developer Tool
Monitoring: Binaries
Monitoring: Spark Context
Monitoring: Jobs
Monitoring: Yarn Job


Taught by

Databricks

Related Courses

Confluent Certified Developer for Apache Kafka (CCDAK)
A Cloud Guru
Amazon API Gateway for Serverless Applications
Amazon Web Services via AWS Skill Builder
Amazon API Gateway for Serverless Applications (Japanese) 日本語実写版
Amazon Web Services via AWS Skill Builder
Amazon API Gateway for Serverless Applications (Traditional Chinese)
Amazon Web Services via AWS Skill Builder
Amazon Connect Development Fundamentals
Amazon Web Services via AWS Skill Builder