YoVDO

Preventing Common Pitfalls in Production Streaming Jobs

Offered By: Databricks via YouTube

Tags

Data Streaming Courses Apache Spark Courses Databricks Courses Fault Tolerance Courses

Course Description

Overview

Explore critical aspects of running streaming jobs in production environments through this 54-minute conference talk by Databricks. Learn how to prevent common pitfalls that can cause serious issues when productionizing streaming jobs. Dive into four key topics: configuring input parameters to handle unexpected data volume increases, tuning stateful streaming parameters to avoid infinite state accumulation, optimizing Structure Streaming output parameters to prevent small file problems, and modifying streaming jobs in production with checkpoints. Gain practical, hands-on examples of issue manifestation and prevention techniques. Equip yourself with the knowledge to design performant and fault-tolerant streams, ensuring smooth operation in production environments.

Syllabus

Introduction
Agenda
Input Parameters
Tuning Max Files per Trigger
Tuning State Parameters
Performing Aggregates
Watermark
Help Function
State Store Provider
State Store Limits
Delta Back State
Delta Back State Code
Performance
Output Parameters
Small Things to Consider


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera