YoVDO

Real-Time Forecasting at Scale Using Delta Lake and Delta Caching

Offered By: Databricks via YouTube

Tags

Delta Lake Courses ARIMA Courses Time Series Forecasting Courses Programmatic Advertising Courses Data Pipelines Courses

Course Description

Overview

Explore a 25-minute conference talk on real-time forecasting at scale using Delta Lake and Delta Caching, presented by Databricks. Dive into GumGum's data pipeline and architecture that processes 30 billion programmatic inventory impressions daily, generating near-real-time inventory forecasts with a response time under 30 seconds. Learn about efficient Spark job sampling techniques, best practices for Delta Lake usage, and the advantages of Databricks Delta caching over conventional Spark caching. Discover how GumGum implements time series forecasting with zero downtime using auto ARIMA and sinusoids to capture inventory data trends. Gain insights into AMIND sampling, Delta Lake for sampled data storage, and efficient cluster utilization. The talk covers the entire workflow, from data sampling and caching to search and forecast, including other models and forecasting accuracy.

Syllabus

Introduction
programmatic inventory
why forecast inventory
the scale
Architecture
Data Sampling
Sampling Approach
Types of Sampling
Sampling Daily Job
Search and Forecast
Caching the Data
Workflow
Other Models
Forecasting Accuracy


Taught by

Databricks

Related Courses

Predictive Analytics
Indian Institute of Management Bangalore via edX
Intro to Time Series Analysis in R
Coursera Project Network via Coursera
Pronóstico de la generación de energía eólica y solar fotovoltaica
Galileo University via edX
Time Series Analysis and Forecasting using Python
Udemy
Time Series Analysis in Python: Master Applied Data Analysis
Udemy