YoVDO

Data Science Across Data Sources with Apache Arrow - Accelerating Analytics and Interoperability

Offered By: Databricks via YouTube

Tags

Apache Arrow Courses Cloud Computing Courses Microservices Courses PySpark Courses Data Lakes Courses Data Processing Courses

Course Description

Overview

Explore the power of Apache Arrow in this 37-minute conference talk from Databricks. Learn how this open-source, columnar, in-memory data representation enables real-time data exchange and processing across analytical systems and data sources. Discover how Arrow simplifies and accelerates data access without the need for physical data consolidation, addressing challenges in microservices and cloud app environments. Gain insights into Arrow's integration with various open-source and commercial technologies, including GPU databases, machine learning libraries, execution engines, and visualization frameworks. Witness impressive performance improvements, such as a 50x speedup in PySpark. Understand how organizations can leverage Arrow to enable efficient data access and analysis across disparate sources without centralized data repositories. Dive into topics like Dremel, data lake storage, data consumers, data warehouses, and cloud caching. Experience a practical demonstration showcasing Arrow's capabilities with Python and Spark examples.

Syllabus

Intro
About Dremel
Data Lake Storage
Data Consumers
Data Warehouse
Apache Arrow
Exponential Growth
Gandiva
Performance Improvements
Cloud Cache
AeroFlight
Python Example
Spark Example
Demo Overview
Demo
Demo with Python


Taught by

Databricks

Related Courses

Machine Learning with RAPIDS - Accelerating Data Science Workflows
Nvidia via YouTube
Streaming Featurization with Ibis, Substrait and Apache Arrow
Open Data Science via YouTube
Sound Data Engineering in Rust - From Bits to DataFrames
Databricks via YouTube
DataFusion and Apache Arrow: Supercharging Data Analytics with a Rust-Based Query Engine
Databricks via YouTube
Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Databricks via YouTube