Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Offered By: Databricks via YouTube
Course Description
Overview
Syllabus
Intro
The Business Intelligence use case How BI tools connect to Databricks?
Data growth
Challenges and opportunities Breaking down the extract problem Problem
Fetching query results Result pagination
Importing tables Use internal compute engine
Serving results before Arrow Multiple layers of conversion
Serving results with Arrow Bring results faster to the client
Collecting results in Arrow format Tasks generate Arrow batches
Arrow batch sizing Fetching Arrow batches
Improvements with Arrow Speedups up less than 3x
Extract bottlenecks
New data extract architecture Cloud Fotch system design
Inlining small results Hybrid results
Data layout File sizing and pagination
Fetching results from URLS Parallel file downloads
Cloud Fetch performance Extract faster than BI tools can ingest
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
Conclusions Scaled up extract workloads using cloud storage
DATA+AI SUMMIT 2022
Taught by
Databricks
Related Courses
Architecting Microsoft Azure SolutionsMicrosoft via edX Computing, Storage and Security with Google Cloud Platform
Google via Coursera Windows Server 2016: Azure for On-Premises Administrators
Microsoft via edX Microsoft Professional Orientation : Cloud Administration
Microsoft via edX IT Support: Troubleshooting Microsoft Office
Microsoft via edX