Apache Arrow and Substrait - The Secret Foundations of Data Engineering
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Discover the transformative impact of Apache Arrow and Substrait on data engineering in this 44-minute conference talk from EuroPython 2023. Explore how PyArrow, the Python library for Apache Arrow, is becoming the de facto standard for data transfer and interoperability across libraries and languages. Learn about the growing adoption of Substrait as the standard representation for query plans, enabling seamless routing and decomposition of queries across different engines. Gain insights into how popular Python libraries like Pandas and Polars leverage Arrow, and understand how compute engines such as Velox, Datafusion, and Acero are embracing both Arrow and Substrait. Witness the construction of a basic database system using Arrow and Substrait with minimal code, showcasing the powerful foundations these technologies provide for modern data engineering.
Syllabus
Apache Arrow and Substrait, the secret foundations of Data Engineering — Alessandro Molina
Taught by
EuroPython Conference
Related Courses
Machine Learning with RAPIDS - Accelerating Data Science WorkflowsNvidia via YouTube Streaming Featurization with Ibis, Substrait and Apache Arrow
Open Data Science via YouTube Sound Data Engineering in Rust - From Bits to DataFrames
Databricks via YouTube DataFusion and Apache Arrow: Supercharging Data Analytics with a Rust-Based Query Engine
Databricks via YouTube Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Databricks via YouTube