DataSketches: A Production Quality Sketching Library for Big Data Analysis
Offered By: Databricks via YouTube
Course Description
Overview
Explore the world of sketching algorithms for big data analysis in this 29-minute talk from Databricks. Dive into the challenges of processing massive datasets and learn how specialized algorithms called 'sketches' can provide accurate approximate answers to problem queries. Discover how this technology has helped Yahoo reduce data processing times from days to minutes and enabled subsecond queries on real-time platforms. Get an introduction to DataSketches, an open-source library of core sketching algorithms designed for large production analysis and AI systems. Understand the properties of sketches, including query space partitioning, speed, and time windowing. Learn about the benefits of sketching, such as lower system costs and improved scalability. Gain insights into the future of sketching algorithms and their potential impact on big data analysis.
Syllabus
Introduction
Challenges with Big Data
Common Big Data Queries
Difficulty
Parallelization
Last 30 Days
The Sketch
Properties
Major Properties
Query Space
Partitioning
Query Speed
Time Windowing
Example
Lower System Cost
Team
Mission
Family Groups
The Future
Taught by
Databricks
Related Courses
Real-Time Analytics with Apache StormTwitter via Udacity Introduction to NoSQL Data Solutions
Microsoft via edX Big Data Emerging Technologies
Yonsei University via Coursera Data Engineer, Big Data and ML on Google Cloud auf Deutsch
Google Cloud via Coursera Leveraging Real-Time Analytics in Slack
Coursera Project Network via Coursera