Introducing Apache Spark 3.0 - A Decade of Progress and Future Outlook
Offered By: Databricks via YouTube
Course Description
Overview
Explore the evolution and future of Apache Spark in this keynote from Spark + AI Summit 2020 featuring Matei Zaharia, the original creator of Apache Spark, and Brooke Wenig. Delve into the major community developments with the release of Apache Spark 3.0, designed to enhance usability, speed, and compatibility with various data sources and runtime environments. Discover how Spark 3.0 advances the project's goal of making data processing more accessible through improvements to SQL and Python APIs, as well as automatic tuning and optimization features. Reflect on Spark's 10-year journey since its initial open source release, examining the project's growth, user base expansion, and the evolving ecosystem around it, including Koalas, Delta Lake, and visualization tools. Gain insights into the latest developments in the open-source community, including Apache Spark 3.0 and DBR 7.0, and learn about Databricks' unified data analytics platform powered by Apache Spark.
Syllabus
This is a Special Year for Apache Spark
2008: Datacenter-scale computing
2009: Back to Berkeley
2010: Open Source Spark
2012-15: Expand Access to Spark
Apache Spark Today: Python
Apache Spark Today: SOL
Major Lessons
Apache Spark 3.0
Spark 3.0: SOL Engine
Spark 3.0: Python Usability Python type hints for Pandas UDFs
Spark 3.0: Python and R Performance
Spark 3.0: Other Features
Other Apache Spark Ecosystem Projects
Announcing Koalas 1.0!
Learning Spark 2nd Edition
OSS Spark Development Initiatives at Databricks
Taught by
Databricks
Related Courses
Introduction to DatabasesMeta via Coursera Web Development
Udacity Introduction to Data Science
University of Washington via Coursera Datenmanagement mit SQL
openHPI Sabermetrics 101: Introduction to Baseball Analytics
Boston University via edX