YoVDO

DataSketches: A Production Quality Sketching Library for Big Data Analysis

Offered By: Databricks via YouTube

Tags

Big Data Courses Scalability Courses Real-Time Analytics Courses Sketching Algorithms Courses

Course Description

Overview

Explore the world of sketching algorithms for big data analysis in this 29-minute talk from Databricks. Dive into the challenges of processing massive datasets and learn how specialized algorithms called 'sketches' can provide accurate approximate answers to problem queries. Discover how this technology has helped Yahoo reduce data processing times from days to minutes and enabled subsecond queries on real-time platforms. Get an introduction to DataSketches, an open-source library of core sketching algorithms designed for large production analysis and AI systems. Understand the properties of sketches, including query space partitioning, speed, and time windowing. Learn about the benefits of sketching, such as lower system costs and improved scalability. Gain insights into the future of sketching algorithms and their potential impact on big data analysis.

Syllabus

Introduction
Challenges with Big Data
Common Big Data Queries
Difficulty
Parallelization
Last 30 Days
The Sketch
Properties
Major Properties
Query Space
Partitioning
Query Speed
Time Windowing
Example
Lower System Cost
Team
Mission
Family Groups
The Future


Taught by

Databricks

Related Courses

On Quantum Linear Algebra for Machine Learning - Quantum Colloquium
Simons Institute via YouTube
On Quantum Linear Algebra for Machine Learning - IPAM at UCLA
Institute for Pure & Applied Mathematics (IPAM) via YouTube
Streaming and Learning Algorithms - Session 7C
IEEE via YouTube
Capacity Analysis of Vector Symbolic Architectures
Simons Institute via YouTube
Sketching Algorithms for Max-DICUT and Other CSPs
Simons Institute via YouTube