YoVDO

Configuration-Driven Reporting on Large Datasets Using Apache Spark

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Data Transformation Courses Data Processing Courses Data Aggregation Courses Configuration Management Courses

Course Description

Overview

Explore a 27-minute conference talk from Databricks on configuration-driven reporting for large datasets using Apache Spark. Dive into the challenges of processing petabytes of financial transactional data and learn about American Express's next-generation reporting framework. Discover how this highly configurable enterprise solution enables the generation of hundreds of different reports for thousands of partners without additional development. Gain insights into dynamic scheduling, data transformation, aggregation, and filtering using Spark's in-memory and parallel processing capabilities. Understand the implementation of business rules and the integration of template engines like FreeMarker and Mustache. Examine the technical components, configuration file structure, and various stages of the reporting process, including schema application, data lookup, transformation rules, and template application. Assess the success metrics of this innovative approach to handling large-scale reporting needs in the financial industry.

Syllabus

Intro
Introduction- What is Reporting Framework?
STATISTICS AND GENERAL NEED
PATTERN:Need for Configuration based Reporting
Technical Components
A Sample Configuration File
Apply Schema Stage
Data Lookup Stage
Apply Transformation Rules Stage
Apply Transformation Rules (continued...)
Apply Template
Success Metrics


Taught by

Databricks

Related Courses

Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms
DevOps for Developers: How to Get Started
Microsoft via edX
Configuration Management on Google Cloud Platform
Google via Coursera
Windows Server 2016: Infrastructure
Microsoft via edX
Introduction to SAP HANA Administration
SAP Learning