YoVDO

Streaming Featurization with Ibis, Substrait and Apache Arrow

Offered By: Open Data Science via YouTube

Tags

Data Engineering Courses Big Data Courses Machine Learning Courses Real-Time Data Processing Courses Streaming Data Processing Courses Apache Arrow Courses

Course Description

Overview

Explore a collaborative effort between Two Sigma and Voltron Data to enhance featurization workflow performance using Ibis, Substrait, and Apache Arrow in this 31-minute conference talk. Learn about the evolution of open-source data science at Two Sigma, featurization challenges, and the key components of this powerful software stack. Dive into Apache Arrow's high-performance data representation, Ibis' high-level APIs for data processing and analysis, and Substrait's machine learning framework. Discover how this integrated solution enables real-time streaming data processing, providing fast and accurate insights for decision-making. Gain valuable knowledge about the future of data science interfaces and their potential to work with multiple data engines.

Syllabus

- Introductions
- How I Met Wes McKinney
- Timeline of Open Source Data Science at TS
- Featurization Challenges
- About Wes McKinney
- Apache Arrow
- Ibis
- Substrait
- One Data Science Interface; Many Data Engines
- Look Ahead


Taught by

Open Data Science

Related Courses

Machine Learning with RAPIDS - Accelerating Data Science Workflows
Nvidia via YouTube
Sound Data Engineering in Rust - From Bits to DataFrames
Databricks via YouTube
DataFusion and Apache Arrow: Supercharging Data Analytics with a Rust-Based Query Engine
Databricks via YouTube
Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Databricks via YouTube
Data Science Across Data Sources with Apache Arrow - Accelerating Analytics and Interoperability
Databricks via YouTube