Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
The Business Intelligence use case How BI tools connect to Databricks?
Data growth
Challenges and opportunities Breaking down the extract problem Problem
Fetching query results Result pagination
Importing tables Use internal compute engine
Serving results before Arrow Multiple layers of conversion
Serving results with Arrow Bring results faster to the client
Collecting results in Arrow format Tasks generate Arrow batches
Arrow batch sizing Fetching Arrow batches
Improvements with Arrow Speedups up less than 3x
Extract bottlenecks
New data extract architecture Cloud Fotch system design
Inlining small results Hybrid results
Data layout File sizing and pagination
Fetching results from URLS Parallel file downloads
Cloud Fetch performance Extract faster than BI tools can ingest
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
Conclusions Scaled up extract workloads using cloud storage
DATA+AI SUMMIT 2022
Taught by
Databricks
Related Courses
Software as a ServiceUniversity of California, Berkeley via Coursera Software Defined Networking
Georgia Institute of Technology via Coursera Pattern-Oriented Software Architectures: Programming Mobile Services for Android Handheld Systems
Vanderbilt University via Coursera Web-Technologien
openHPI Données et services numériques, dans le nuage et ailleurs
Certificat informatique et internet via France Université Numerique