YoVDO

Project Zen - Improving Apache Spark for Python Users

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Data Science Courses Big Data Courses Python Courses pandas Courses PySpark Courses PyPi Courses

Course Description

Overview

Explore the advancements in Apache Spark for Python users through this 21-minute conference talk from Databricks. Dive into Project Zen, an initiative aimed at making PySpark more Pythonic and user-friendly. Learn about the redesigned pandas UDFs, improved error messages in UDF, and new features introduced in Apache Spark 3.0 and 3.1. Discover the roadmap for Project Zen, including redesigned PySpark documentation, PySpark type hints, new installation options for PyPI users, standardized warnings and exceptions, and visualization improvements. Gain insights into the rapid growth of PySpark users and the increasing importance of Python in data science. Understand how these enhancements align with The Zen of Python principles and contribute to a more efficient and intuitive PySpark experience.

Syllabus

Intro
Python Growth
PySpark Today
The Zen of Python
Project Zen (SPARK-32082)
Problems in PySpark Documentation
New PySpark Documentation
New API Reference
Quickstart
Live Notebook
Other New Pages
What are Python Type Hints?
Why are Python Type Hints good?
Python Type Hints in PySpark
PyPI Distribution
New Installation Options
Why not pip --install-options?
Roadmap
Re-cap: What's next?


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera