Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
The Business Intelligence use case How BI tools connect to Databricks?
Data growth
Challenges and opportunities Breaking down the extract problem Problem
Fetching query results Result pagination
Importing tables Use internal compute engine
Serving results before Arrow Multiple layers of conversion
Serving results with Arrow Bring results faster to the client
Collecting results in Arrow format Tasks generate Arrow batches
Arrow batch sizing Fetching Arrow batches
Improvements with Arrow Speedups up less than 3x
Extract bottlenecks
New data extract architecture Cloud Fotch system design
Inlining small results Hybrid results
Data layout File sizing and pagination
Fetching results from URLS Parallel file downloads
Cloud Fetch performance Extract faster than BI tools can ingest
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
Conclusions Scaled up extract workloads using cloud storage
DATA+AI SUMMIT 2022
Taught by
Databricks
Related Courses
Data Wrangling with MongoDBMongoDB via Udacity Data Science Essentials for SAP
OnSAP Academy via Independent Herramientas de la Inteligencia de Negocios
Galileo University via edX Digital Media Analytics: Using 'Listening Data'
Purdue University via FutureLearn Advanced Business Analytics
University of Colorado Boulder via Coursera