Sketching Algorithms: Making Sense of Big Data in a Single Stroke
Offered By: Conf42 via YouTube
Course Description
Overview
Explore the world of sketching algorithms for big data analysis in this conference talk from Conf42 Python 2024. Dive into the concept of sketches as approximate data structures, understanding their characteristics, components, and advantages over exact computations. Learn about distributed processing challenges, the importance of sublinear data structure growth, and mergability in sketch design. Discover various types of sketches, with a focus on the Count-Min Sketch algorithm. Gain insights into open-source sketching libraries like Apache DataSketches and their extensions. Equip yourself with knowledge to tackle non-additive challenges in data processing and understand why sketches offer faster solutions for big data problems.
Syllabus
intro
preamble
hello
quix
quix streams
quix cloud
what is a sketch?
approximate answers
sketch characteristics
sketch components
why exact == slow
distributed processing
unique word count
massively parallel processing mpp
shuffling is slow
latency numbers every programmer should know
why sketches == fast
sketch design
sublinear data structure growth
mergability
non-additive challenges are everywhere
unique counts are non-additive
non-additive challenges solved
types of sketches
count min sketch
open source sketches
apache datasketches java, c++, python
datasketch extensions
thank you
Taught by
Conf42
Related Courses
Sampling-Based Sublinear Low-Rank Matrix Arithmetic Framework for Dequantizing Quantum Machine LearningAssociation for Computing Machinery (ACM) via YouTube Sublinear Algorithms for Gap Edit Distance
IEEE via YouTube High Dimensional Robust Sparse Regression
Simons Institute via YouTube Learning-Augmented Sketches for Frequency Estimation
Simons Institute via YouTube Adaptive Sparse Recovery with Limited Adaptivity
Simons Institute via YouTube