YoVDO

Deep Dive into New Features of Apache Spark 3.1

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses SQL Courses PySpark Courses Data Processing Courses Data Engineering Courses

Course Description

Overview

Explore the latest advancements in Apache Spark 3.1 through this comprehensive 49-minute Databricks video. Dive deep into over 1500 resolved JIRAs, focusing on key improvements that make Spark faster, easier, and smarter. Learn about crucial SQL features for ANSI compliance, innovative streaming capabilities, and Python usability enhancements. Discover performance optimizations and new tuning techniques in the query compiler. Gain insights into upcoming major initiatives and future developments. Through examples and demos, understand important changes such as ANSI SQL mode, unified CREATE TABLE syntax, CHAR/VARCHAR support, node decommissioning, shuffle hash join improvements, partition pruning, predicate pushdown, and reduced query compiling latency. Explore advancements in stream-stream joins, state store for Structured Streaming, PySpark type hints, static error detection, Python dependency management, and new utility functions for Unix time and time zones. Familiarize yourself with usability enhancements, documentation updates, and important deprecations and removals in this essential update for Spark developers and data professionals.

Syllabus

Intro
ANSI SOL Compliance
Fail Earlier for Invalid Data
Forbid Confusing CAST
ANSI Mode GA in Spark 3.2
Unified CREATE TABLE SOL Syntax
CHAR/VARCHAR Support
More ANSI Features Coming in Spark 3.2!
Node Decommissioning
Summary
SOL Performance
Shuffle Hash Join Improvement
Partition Pruning Improvement
Predicate Pushdown Improvement
Reduce Query Compiling Latency (3.2)
Stream-stream Join
State Store for Structured Streaming
Rocks DB State Store
Add the type hints PEP 484 to PySpark!
Static Error Detection
Python Dependency Management
Visualization and Plotting
Usability Enhancements
New Utility Functions for Unix Time
New Utility Functions for Time Zone
EXPLAIN FORMMATTED
Ignore Hints
Documentation and Environments
New Doc for PySpark
Deprecations and Removals


Taught by

Databricks

Related Courses

内存数据库管理
openHPI
CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Processing Big Data with Azure Data Lake Analytics
Microsoft via edX
Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera
Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera