FOCUS: Boosting Schema-aware Access for KV Stores via Hierarchical Data Management

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing key-value (KV) stores exhibit a semantic gap with structured applications such as NewSQL systems, forcing hierarchical data to be flattened into flat key-value pairs—causing severe I/O amplification and I/O fragmentation. To bridge this gap, we propose the first log-structured KV store supporting fine-grained hierarchical data organization and schema-aware access. Our approach introduces: (1) a hierarchical KV data model that natively represents nested structures and preserves schema semantics; and (2) an NVM-optimized log-structured engine integrating hierarchical key-space management with schema-aware write-path optimization and query routing—eliminating the need for flattening entirely. Evaluated under YCSB SQL workloads, our system achieves 2.1–5.9× higher throughput than state-of-the-art NVM-based KV stores, significantly improving structured data ingestion and retrieval efficiency.

Technology Category

Application Category

📝 Abstract

Persistent key-value (KV) stores are critical infrastructure for data-intensive applications. Leveraging high-performance Non-Volatile Memory (NVM) to enhance KV stores has gained traction. However, previous work has primarily focused on optimizing KV stores themselves, without adequately addressing their integration into applications. Consequently, existing applications, represented by NewSQL databases, still resort to a flat mapping approach, which simply maps structured records into flat KV pairs to use KV stores. Such semantic mismatch may cause significant I/O amplification and I/O splitting under production workloads, harming the performance. To this end, we propose FOCUS, a log-structured KV store optimized for fine-grained hierarchical data organization and schema-aware access. FOCUS introduces a hierarchical KV model to provide native support for upper-layer structured data. We implemented FOCUS from scratch. Experiments show that FOCUS can increase throughput by 2.1-5.9x compared to mainstream NVM-backed KV stores under YCSB SQL workloads.

Problem

Research questions and friction points this paper is trying to address.

Addresses semantic mismatch in KV stores for structured data

Reduces I/O amplification and splitting in production workloads

Enables schema-aware access via hierarchical data organization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical KV model for structured data

Log-structured KV store optimization

Schema-aware access for performance boost

🔎 Similar Papers

LearnedKV: Integrating LSM and Learned Index for Superior Performance on Storage