Apache Arrow and Substrait - The Secret Foundations of Data Engineering
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Discover the transformative impact of Apache Arrow and Substrait on data engineering in this 44-minute conference talk from EuroPython 2023. Explore how PyArrow, the Python library for Apache Arrow, is becoming the de facto standard for data transfer and interoperability across libraries and languages. Learn about the growing adoption of Substrait as the standard representation for query plans, enabling seamless routing and decomposition of queries across different engines. Gain insights into how popular Python libraries like Pandas and Polars leverage Arrow, and understand how compute engines such as Velox, Datafusion, and Acero are embracing both Arrow and Substrait. Witness the construction of a basic database system using Arrow and Substrait with minimal code, showcasing the powerful foundations these technologies provide for modern data engineering.
Syllabus
Apache Arrow and Substrait, the secret foundations of Data Engineering — Alessandro Molina
Taught by
EuroPython Conference
Related Courses
Computational Investing, Part IGeorgia Institute of Technology via Coursera Введение в машинное обучение
Higher School of Economics via Coursera Математика и Python для анализа данных
Moscow Institute of Physics and Technology via Coursera Introduction to Python for Data Science
Microsoft via edX Python for Data Science
University of California, San Diego via edX