Hyperspace - An Indexing Subsystem for Apache Spark
Offered By: Databricks via YouTube
Course Description
Overview
Explore the design, implementation, and operationalization of Hyperspace, an indexing subsystem for Apache Spark, in this 32-minute conference talk by Databricks. Learn about the foundations of the indexing infrastructure, including API design and integration with Spark's Catalyst optimizer. Discover how Hyperspace enables users to build, maintain, and leverage indexes on various data formats for query acceleration and resource cost reduction. Gain insights into the multi-user concurrency model and the development roadmap for open-sourcing this technology. Through presentations, benchmarks, code examples, and notebooks, delve into the world of efficient data indexing for large-scale datasets ranging from GBs to PBs, addressing both batch-style queries and explorative analytics.
Syllabus
Introduction
Who are we
What is an index
Overview
Investment
APIs
Index Creation
Index Benefits
Demo
Investment Areas
Hyperspace Types
Taught by
Databricks
Related Courses
Understanding China, 1700-2000: A Data Analytic Approach, Part 1The Hong Kong University of Science and Technology via Coursera The Analytics Edge
Massachusetts Institute of Technology via edX 大数据与信息传播 Big Data and Information Dissemination
Fudan University via Coursera The Future of Fashion
Marist College via Independent The Mobile Consumer
Marist College via Independent