YoVDO

How Adobe Processes 2 Million Records Per Second Using Apache Spark

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Adobe Courses Redis Courses Data Processing Courses Data Ingestion Courses

Course Description

Overview

Explore how Adobe processes 2 million records per second using Apache Spark in this 41-minute Databricks conference talk. Dive into the challenges and solutions of Adobe's Unified Profile System, which ingests terabytes of data daily. Learn about optimizing repeated queries, understanding join operations, monitoring structured streaming lag, handling data skew, effective sampling techniques, and leveraging Redis for enhanced performance. Gain valuable insights from Adobe's experiences in scaling their Apache Spark deployment, including practical tips on caching physical plans, managing shuffles, dealing with backpressure, and making code resilient to skewed datasets. Benefit from real-world war stories and lessons that can be applied to large-scale data processing challenges in your own projects.

Syllabus

Intro
What do you mean by Processing? Agenda!
Unified Profile Data Ingestion
Generic Flow
Flow with MinPartitions partitions on Kafka
MicroBatch Hard! Logic Best Practices
An Example
For Repeated Queries Over Same DF
Join Optimization For Interactive Queries (Opinionated)
How to get the magic targetPartitionCount?
Digging into Redis Pipelining + Spark


Taught by

Databricks

Related Courses

Adobe Illustrator: aprende a crear presentaciones de impacto
The Pontificia Universidad Javeriana via edX
Bridge CC 2015 Essential Training
LinkedIn Learning
Fireworks CS6 Essential Training
LinkedIn Learning
FrameMaker 2015 Essential Training
LinkedIn Learning
Learning Adobe Presenter 10
LinkedIn Learning