YoVDO

Streaming Featurization with Ibis, Substrait and Apache Arrow

Offered By: Open Data Science via YouTube

Tags

Data Engineering Courses Big Data Courses Machine Learning Courses Real-Time Data Processing Courses Streaming Data Processing Courses Apache Arrow Courses

Course Description

Overview

Explore a collaborative effort between Two Sigma and Voltron Data to enhance featurization workflow performance using Ibis, Substrait, and Apache Arrow in this 31-minute conference talk. Learn about the evolution of open-source data science at Two Sigma, featurization challenges, and the key components of this powerful software stack. Dive into Apache Arrow's high-performance data representation, Ibis' high-level APIs for data processing and analysis, and Substrait's machine learning framework. Discover how this integrated solution enables real-time streaming data processing, providing fast and accurate insights for decision-making. Gain valuable knowledge about the future of data science interfaces and their potential to work with multiple data engines.

Syllabus

- Introductions
- How I Met Wes McKinney
- Timeline of Open Source Data Science at TS
- Featurization Challenges
- About Wes McKinney
- Apache Arrow
- Ibis
- Substrait
- One Data Science Interface; Many Data Engines
- Look Ahead


Taught by

Open Data Science

Related Courses

Conceptualizing the Processing Model for the AWS Kinesis Data Analytics Service
Pluralsight
Processing Streaming Data Using Apache Flink
Pluralsight
Processing Streaming Data Using Apache Spark Structured Streaming
Pluralsight
Exploring the Apache Spark Structured Streaming API for Processing Streaming Data
Pluralsight
Exploring the Apache Beam SDK for Modeling Streaming Data for Processing
Pluralsight