YoVDO

Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Offered By: Databricks via YouTube

Tags

Business Intelligence Courses Cloud Computing Courses Data Warehousing Courses Data Extraction Courses Cloud Storage Courses Apache Arrow Courses

Course Description

Overview

Explore high-bandwidth connectivity with BI tools through Cloud Fetch in this 20-minute Databricks video. Learn how to overcome the data transfer bottleneck in traditional data warehouses when extracting large query results using Business Intelligence tools like Tableau and Microsoft Power BI. Discover the new parallel data fetching mechanism via cloud storage, such as AWS S3 and Azure Data Lake Storage, which can result in a 10x speed-up in extract performance. Dive into the challenges of data growth, the intricacies of result pagination, and the improvements made with Apache Arrow. Understand the new data extract architecture, including hybrid results, data layout considerations, and parallel file downloads. Gain insights into Cloud Fetch's real-world performance and its ability to scale up extract workloads using cloud storage, ultimately enabling faster data ingestion for BI tools.

Syllabus

Intro
The Business Intelligence use case How BI tools connect to Databricks?
Data growth
Challenges and opportunities Breaking down the extract problem Problem
Fetching query results Result pagination
Importing tables Use internal compute engine
Serving results before Arrow Multiple layers of conversion
Serving results with Arrow Bring results faster to the client
Collecting results in Arrow format Tasks generate Arrow batches
Arrow batch sizing Fetching Arrow batches
Improvements with Arrow Speedups up less than 3x
Extract bottlenecks
New data extract architecture Cloud Fotch system design
Inlining small results Hybrid results
Data layout File sizing and pagination
Fetching results from URLS Parallel file downloads
Cloud Fetch performance Extract faster than BI tools can ingest
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
Conclusions Scaled up extract workloads using cloud storage
DATA+AI SUMMIT 2022


Taught by

Databricks

Related Courses

Software as a Service
University of California, Berkeley via Coursera
Software Defined Networking
Georgia Institute of Technology via Coursera
Pattern-Oriented Software Architectures: Programming Mobile Services for Android Handheld Systems
Vanderbilt University via Coursera
Web-Technologien
openHPI
Données et services numériques, dans le nuage et ailleurs
Certificat informatique et internet via France Université Numerique