YoVDO

Fast RDMA-based Ordered Key-Value Store Using Remote Learned Cache

Offered By: USENIX via YouTube

Tags

OSDI (Operating Systems Design and Implementation) Courses Machine Learning Courses Distributed Systems Courses

Course Description

Overview

Explore an innovative RDMA-based ordered key-value store using remote learned cache in this USENIX OSDI '20 conference talk. Dive into the design and implementation of XSTORE, which combines a tree-based index at the server for dynamic workloads with a learned cache at the client for static workloads. Learn how this hybrid architecture leverages machine learning models as a cache structure for tree-based indexes, decoupling model retraining from index updating. Discover how XSTORE achieves impressive performance, outperforming state-of-the-art RDMA-based ordered key-value stores by up to 5.9 times. Understand the challenges of implementing learned caches and the solutions employed, including handling stale models and ensuring correctness through validation mechanisms. Gain insights into the memory-performance tradeoffs and the potential for significant client-side memory savings.

Syllabus

Intro
KVS: key pillar for distributed systems
Traditional KVS uses RPC (Server-centric)
Challenge: limited NIC abstraction
Existing systems adopt caching
High cache miss cost for caching tree Tree node size can be much larger than the KV
Trade-off of existing KVS
Overview of XSTORE Hybrid architecture 11
Our approach: Learned cache Using ML as the cache structure for tree-based index Motivated by the learned index[1]
Client-direct Get() using learned cache
Benefits of the learned cache
Challenges of learned cache
Outline of the remaining content Server-side data structure for dynamic workloads
Models cannot learn dynamic B+Tree address Can only learn when the addresses are sorted
Solution: another layer of indirection Observation: leaf nodes are logically sorted
Client-direct Get() using model & TT
Model retraining Model is retrained at server in background threads 9: Small cost & extra CPU usage at the server
Stale model handling Background update causes stale learned models
Performance of XSTORE on YCSB 100M KVS, uniform workloads
Sensitive to the dataset


Taught by

USENIX

Related Courses

Advanced Operating Systems
Georgia Institute of Technology via Udacity
High Performance Computing
Georgia Institute of Technology via Udacity
GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity
Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX
CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX