🤖 AI Summary
To address the read/write performance bottlenecks and high storage overhead of LSM-trees, LearnedKV proposes a hierarchical key-value storage architecture that decouples the LSM-tree—dedicated to write-intensive workloads—from a standalone learned index—optimized for read operations. It introduces a non-blocking, online data transformation mechanism during garbage collection to seamlessly integrate LSM data into the learned index. This work pioneers a two-layer decoupled design and a non-blocking model migration strategy, overcoming the limitation of prior learned indexes as mere auxiliary structures and significantly reducing LSM-tree size. LearnedKV supports adaptive model training and query optimization, and is compatible with heterogeneous SSD/HDD storage. Experimental results demonstrate up to 4.32× and 1.43× improvements in read and write throughput, respectively, while maintaining strong robustness under skewed data distributions and hot-key access patterns.
📝 Abstract
We present LearnedKV, a novel tiered key-value store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index to achieve superior read and write performance on storage systems. While existing approaches use learned indexes primarily as auxiliary components within LSM trees, LearnedKV employs a two-tier design where the LSM tree handles recent write operations while a separate Learned Index accelerates read performance. Our design includes a non-blocking conversion mechanism that efficiently transforms LSM data into a Learned Index during garbage collection, maintaining high performance without interrupting operations. LearnedKV dramatically reduces LSM size through this tiered approach, leading to significant performance gains in both reads and writes. Extensive evaluations across diverse workloads show that LearnedKV outperforms state-of-the-art LSM-based solutions by up to 4.32x for read operations and 1.43x for writes. The system demonstrates robust performance across different data distributions, access patterns, and storage media including both SSDs and HDDs.