YoVDO

Hyperspace - An Indexing Subsystem for Apache Spark

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Databricks Courses Performance Tuning Courses Data Analytics Courses

Course Description

Overview

Explore the design, implementation, and operationalization of Hyperspace, an indexing subsystem for Apache Spark, in this 32-minute conference talk by Databricks. Learn about the foundations of the indexing infrastructure, including API design and integration with Spark's Catalyst optimizer. Discover how Hyperspace enables users to build, maintain, and leverage indexes on various data formats for query acceleration and resource cost reduction. Gain insights into the multi-user concurrency model and the development roadmap for open-sourcing this technology. Through presentations, benchmarks, code examples, and notebooks, delve into the world of efficient data indexing for large-scale datasets ranging from GBs to PBs, addressing both batch-style queries and explorative analytics.

Syllabus

Introduction
Who are we
What is an index
Overview
Investment
APIs
Index Creation
Index Benefits
Demo
Investment Areas
Hyperspace Types


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera