YoVDO

PySpark in Apache Spark 3.3 and Beyond

Offered By: Databricks via YouTube

Tags

PySpark Courses Data Science Courses Python Courses Apache Spark Courses Data Engineering Courses

Course Description

Overview

Explore the latest advancements in PySpark introduced with Apache Spark 3.3 and get a glimpse of future developments in this 35-minute Databricks conference talk. Dive into the evolution of PySpark since Project Zen's inception in Apache Spark 3.0, including improved error messages, type hints for autocompletion, and visualization implementations. Learn about the popular Pandas API on Spark, introduced in Apache Spark 3.2, which allows running pandas API on Apache Spark. Discover the new features in Apache Spark 3.3, such as expanded API coverage, faster default indexing in Pandas API on Spark, datetime.timedelta support, new PyArrow batch interface, enhanced autocompletion, Python & Pandas UDF profiler, and new error classification. Gain insights into the current efforts and roadmap for PySpark beyond Apache Spark 3.3, covering aspects of functionality, productivity, usability, performance, and feature parity.

Syllabus

Intro
Who are you?
Project Zen
What is this talk about?
Pandas API on Spark
New Functionalities
Productivity
Usability
Performance
Feature parity
PySpark in Apache Spark 3.3
PySpark in future Apache Spark
DATA-AI SUMMIT 2022


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera