Physical vs. Logical Indexing with IDEA - Inverted Deduplication-Aware Index
Offered By: USENIX via YouTube
Course Description
Overview
Explore a groundbreaking approach to term-indexing in deduplicated data systems through this 26-minute conference talk from FAST '24. Dive into the challenges of maintaining efficient term-indexing in the face of growing online data and widespread use of data deduplication in storage systems. Learn about IDEA (Inverted Deduplication-Aware Index), a novel design that maps terms to unique data chunks and chunks to containing files, addressing inefficiencies in traditional indexing methods. Discover how this approach can significantly reduce index size, indexing time, and term-lookup latency while supporting advanced functionalities like inline indexing, result ranking, and proximity search. Gain insights from a prototype implementation based on Lucene, demonstrating substantial improvements in various performance metrics compared to conventional indexing techniques.
Syllabus
FAST '24 - Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index
Taught by
USENIX
Related Courses
Semantic Web TechnologiesopenHPI أساسيات استرجاع المعلومات
Rwaq (رواق) 《gacco特別企画》Evernoteで広がるgaccoの学びスタイル (ga038)
University of Tokyo via gacco La Web Semántica: Herramientas para la publicación y extracción efectiva de información en la Web
Pontificia Universidad Católica de Chile via Coursera 快速学习
University of Science and Technology of China via Coursera