🤖 AI Summary
Existing key-value (KV) stores exhibit a semantic gap with structured applications such as NewSQL systems, forcing hierarchical data to be flattened into flat key-value pairs—causing severe I/O amplification and I/O fragmentation. To bridge this gap, we propose the first log-structured KV store supporting fine-grained hierarchical data organization and schema-aware access. Our approach introduces: (1) a hierarchical KV data model that natively represents nested structures and preserves schema semantics; and (2) an NVM-optimized log-structured engine integrating hierarchical key-space management with schema-aware write-path optimization and query routing—eliminating the need for flattening entirely. Evaluated under YCSB SQL workloads, our system achieves 2.1–5.9× higher throughput than state-of-the-art NVM-based KV stores, significantly improving structured data ingestion and retrieval efficiency.
📝 Abstract
Persistent key-value (KV) stores are critical infrastructure for data-intensive applications. Leveraging high-performance Non-Volatile Memory (NVM) to enhance KV stores has gained traction. However, previous work has primarily focused on optimizing KV stores themselves, without adequately addressing their integration into applications. Consequently, existing applications, represented by NewSQL databases, still resort to a flat mapping approach, which simply maps structured records into flat KV pairs to use KV stores. Such semantic mismatch may cause significant I/O amplification and I/O splitting under production workloads, harming the performance. To this end, we propose FOCUS, a log-structured KV store optimized for fine-grained hierarchical data organization and schema-aware access. FOCUS introduces a hierarchical KV model to provide native support for upper-layer structured data. We implemented FOCUS from scratch. Experiments show that FOCUS can increase throughput by 2.1-5.9x compared to mainstream NVM-backed KV stores under YCSB SQL workloads.