Faster Data Integration Pipeline Execution Using Spark-Jobserver
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Informatica ETL Pipeline
Dealing with buggy pipelines
Data Preview - Feature Requirements
What spark-submit based data preview achieved?
Execution Profiling Results - Spark-submit
Compare Spark-submit with Spark Job Server
Spark-submit based Architecture
SJS based Architecture
Execution Flow
Spark Job Server vs Spark-submit
Setup Details
Getting started
Environment Variables (local.sh. template)
Application Code Migration
WordCount Example
Running Jobs
Handling Job Dependencies
Multiple Spark Job Servers
Concurrency
Support for Kerberos
HTTPS/SSL Enabled Server
Logging
Key Takeaways
Timeouts (in local.conf. template)
Complex Data Representation in Informatica Developer Tool
Monitoring: Binaries
Monitoring: Spark Context
Monitoring: Jobs
Monitoring: Yarn Job
Taught by
Databricks
Related Courses
Confluent Certified Developer for Apache Kafka (CCDAK)A Cloud Guru Amazon API Gateway for Serverless Applications
Amazon Web Services via AWS Skill Builder Amazon API Gateway for Serverless Applications (Japanese) 日本語実写版
Amazon Web Services via AWS Skill Builder Amazon API Gateway for Serverless Applications (Traditional Chinese)
Amazon Web Services via AWS Skill Builder Amazon Connect Development Fundamentals
Amazon Web Services via AWS Skill Builder