YoVDO

How Adobe Processes 2 Million Records Per Second Using Apache Spark

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Adobe Courses Redis Courses Data Processing Courses Data Ingestion Courses

Course Description

Overview

Explore how Adobe processes 2 million records per second using Apache Spark in this 41-minute Databricks conference talk. Dive into the challenges and solutions of Adobe's Unified Profile System, which ingests terabytes of data daily. Learn about optimizing repeated queries, understanding join operations, monitoring structured streaming lag, handling data skew, effective sampling techniques, and leveraging Redis for enhanced performance. Gain valuable insights from Adobe's experiences in scaling their Apache Spark deployment, including practical tips on caching physical plans, managing shuffles, dealing with backpressure, and making code resilient to skewed datasets. Benefit from real-world war stories and lessons that can be applied to large-scale data processing challenges in your own projects.

Syllabus

Intro
What do you mean by Processing? Agenda!
Unified Profile Data Ingestion
Generic Flow
Flow with MinPartitions partitions on Kafka
MicroBatch Hard! Logic Best Practices
An Example
For Repeated Queries Over Same DF
Join Optimization For Interactive Queries (Opinionated)
How to get the magic targetPartitionCount?
Digging into Redis Pipelining + Spark


Taught by

Databricks

Related Courses

Coding the Matrix: Linear Algebra through Computer Science Applications
Brown University via Coursera
كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق)
Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS
Data Lakes for Big Data
EdCast
統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco