Faster Data Integration Pipeline Execution Using Spark-Jobserver
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Informatica ETL Pipeline
Dealing with buggy pipelines
Data Preview - Feature Requirements
What spark-submit based data preview achieved?
Execution Profiling Results - Spark-submit
Compare Spark-submit with Spark Job Server
Spark-submit based Architecture
SJS based Architecture
Execution Flow
Spark Job Server vs Spark-submit
Setup Details
Getting started
Environment Variables (local.sh. template)
Application Code Migration
WordCount Example
Running Jobs
Handling Job Dependencies
Multiple Spark Job Servers
Concurrency
Support for Kerberos
HTTPS/SSL Enabled Server
Logging
Key Takeaways
Timeouts (in local.conf. template)
Complex Data Representation in Informatica Developer Tool
Monitoring: Binaries
Monitoring: Spark Context
Monitoring: Jobs
Monitoring: Yarn Job
Taught by
Databricks
Related Courses
Windows Server Management and SecurityUniversity of Colorado System via Coursera Cyber Attack Countermeasures
New York University (NYU) via Coursera CompTIA Network+ (N10-007) Cert Prep: 5 Securing TCP/IP
LinkedIn Learning Access Control Mechanisms in Linux
Pluralsight Cloudera Hadoop Administration
YouTube