Faster Data Integration Pipeline Execution Using Spark-Jobserver
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
Informatica ETL Pipeline
Dealing with buggy pipelines
Data Preview - Feature Requirements
What spark-submit based data preview achieved?
Execution Profiling Results - Spark-submit
Compare Spark-submit with Spark Job Server
Spark-submit based Architecture
SJS based Architecture
Execution Flow
Spark Job Server vs Spark-submit
Setup Details
Getting started
Environment Variables (local.sh. template)
Application Code Migration
WordCount Example
Running Jobs
Handling Job Dependencies
Multiple Spark Job Servers
Concurrency
Support for Kerberos
HTTPS/SSL Enabled Server
Logging
Key Takeaways
Timeouts (in local.conf. template)
Complex Data Representation in Informatica Developer Tool
Monitoring: Binaries
Monitoring: Spark Context
Monitoring: Jobs
Monitoring: Yarn Job
Taught by
Databricks
Related Courses
Web Engineering II: Developing Mobile HTML5 AppsTechnische Hochschule Mittelhessen via iversity Introduction to MongoDB using the MEAN Stack
MongoDB via edX Desarrollo de aplicaciones avanzadas con Android
Universidad Nacional Autónoma de México via Coursera Utilisez des API REST dans vos projets web
IBM via OpenClassrooms Extend Your Application with REST Services
Microsoft via edX