YoVDO

User Defined Aggregation in Apache Spark - From Challenges to Improvements

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Databricks Courses Data Processing Courses

Course Description

Overview

Explore the evolution and power of User Defined Aggregate Functions (UDAFs) in Apache Spark through this 21-minute Databricks conference talk. Delve into the journey of customized scalable aggregation logic, from its initial flaws to the improved design in Spark 3.0. Learn how to create your own UDAF library, understand the inner workings of User Defined Aggregation, and discover how the latest UDAF features enhance both usability and performance. Gain insights into the Apache Spark code review process and pick up valuable tips for successfully integrating large features into the upstream community. Follow the speaker's personal experience with UDAFs, from initial challenges to ultimate triumph, while acquiring practical knowledge about this powerful feature in Apache Spark's data processing capabilities.

Syllabus

Intro
Spark's Scale-Out World
Scale-Out Sum
Spark Aggregators
Data Sketching: T-Digest
Is T-Digest an Aggregator?
Romantic Chemistry
Romantic Montage
UDAF Anatomy
What Could Go Wrong?
Wait What?
SPARK-27296
Aggregator Anatomy
Intuitive Serialization
Custom Aggregation in Spark 3.0
Performance
Don't Give Up
Patience
Respect


Taught by

Databricks

Related Courses

Coding the Matrix: Linear Algebra through Computer Science Applications
Brown University via Coursera
كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق)
Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS
Data Lakes for Big Data
EdCast
統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco