YoVDO

Efficient Query Processing for Unstructured Data Using Machine Learning

Offered By: Databricks via YouTube

Tags

Machine Learning Courses Database Management Courses Data Aggregation Courses Unstructured Data Courses

Course Description

Overview

Explore efficient query processing techniques for unstructured data using machine learning in this 27-minute conference talk from Databricks. Learn about the TASTI system developed by Stanford DAWN lab to reduce query costs over unstructured data. Discover how proxy scores can accelerate aggregation, selection, and limit queries, and understand the process of generating these scores through principled clustering of unstructured data records. Gain insights into real-world applications, including ecological analysis and wildfire detection. Delve into the theoretical foundations of this work, based on four VLDB publications, and learn about the open-source code available for implementation.

Syllabus

Intro
Unstructured data is ubiquitous and cheap
ML models can perform well on a range of benchmark tasks
My work: how can we use unreliable and expensive ML models in query processing?
Two key ideas: sampling and proxy scores
Many queries require statistical guarantees on accuracy
Prior work using proxies fail to achieve statistical guarantees on failure probability!
Example query: finding hummingbirds with high recall
Query type two: aggregation Query: "what is the average number of cars per frame?"


Taught by

Databricks

Related Courses

Datenmanagement mit SQL
openHPI
Programming Cloud Services for Android Handheld Systems
Vanderbilt University via Coursera
Getting and Cleaning Data
Johns Hopkins University via Coursera
Ruby مدخل إلى برمجة مواقع الإنترنت باستخدام لغة
Rwaq (رواق)
MongoDB for .NET Developers
MongoDB University