YoVDO

Migrating ETL Workflows to Apache Spark at Scale - Pinterest's Experience

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Data Processing Courses Performance Tuning Courses Data Migration Courses Pinterest Courses ETL Courses

Course Description

Overview

Explore Pinterest's journey in migrating their batch processing to Apache Spark in this 25-minute conference talk from Databricks. Discover the challenges and solutions encountered during the transition from legacy ETL workflows written in Cascading/Scalding. Learn about the migration's motivation, bridging semantic gaps between different engines, handling thrift objects, improving Spark accumulators, and performance tuning using an innovative Spark profiler. Gain insights into the performance improvements and cost savings achieved post-migration. Delve into topics such as Spark clusters, API approaches, translating Cascading and Scalding, secondary sort, accumulator enhancements, profiling techniques, automatic migration services, data validation, and balancing performance. Understand the complexities of large-scale ETL workflow migration and the future plans for Pinterest's data processing infrastructure.

Syllabus

Intro
We Are on Cloud
Spark Clusters
Spark Versions and Use Cases
Migration Plan
Migration Path
Spark API
Approach
Translate Cascading
UDF Translation
Translate Scalding
Secondary Sort
Accumulators
Accumulator Continue
Accumulator Tab in Spark UI
Profiling
Automatic Migration Service (AMS)
Data Validation
Source of Uncertainty
Performance Tuning
Balancing Performance
Automatic Migration & Failure Handling
Future Plan


Taught by

Databricks

Related Courses

AWS Certified Database - Specialty (DBS-C01)
A Cloud Guru
Almacenamiento híbrido y migración de datos con AWS Storage Gateway File Gateway (Español de España) | Hybrid Storage and Data Migration with AWS Storage Gateway File Gateway (Spanish from Spain)
Amazon Web Services via AWS Skill Builder
Almacenamiento híbrido y migración de datos con AWS Storage Gateway File Gateway (Español LATAM) | Hybrid Storage and Data Migration with AWS Storage Gateway File Gateway (LATAM Spanish)
Amazon Web Services via AWS Skill Builder
Amazon Aurora MySQL and Amazon RDS MySQL (Indonesian)
Amazon Web Services via AWS Skill Builder
Amazon Aurora MySQL and Amazon RDS MySQL (Japanese) (Sub) 日本語字幕版
Amazon Web Services via AWS Skill Builder