YoVDO

Streaming Featurization with Ibis, Substrait and Apache Arrow

Offered By: Open Data Science via YouTube

Tags

Data Engineering Courses Big Data Courses Machine Learning Courses Real-Time Data Processing Courses Streaming Data Processing Courses Apache Arrow Courses

Course Description

Overview

Explore a collaborative effort between Two Sigma and Voltron Data to enhance featurization workflow performance using Ibis, Substrait, and Apache Arrow in this 31-minute conference talk. Learn about the evolution of open-source data science at Two Sigma, featurization challenges, and the key components of this powerful software stack. Dive into Apache Arrow's high-performance data representation, Ibis' high-level APIs for data processing and analysis, and Substrait's machine learning framework. Discover how this integrated solution enables real-time streaming data processing, providing fast and accurate insights for decision-making. Gain valuable knowledge about the future of data science interfaces and their potential to work with multiple data engines.

Syllabus

- Introductions
- How I Met Wes McKinney
- Timeline of Open Source Data Science at TS
- Featurization Challenges
- About Wes McKinney
- Apache Arrow
- Ibis
- Substrait
- One Data Science Interface; Many Data Engines
- Look Ahead


Taught by

Open Data Science

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent