Rethinking LSM-tree based Key-Value Stores: A Survey

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
While LSM-trees optimize write throughput, their compaction and flush operations induce fundamental trade-offs—namely write, read, and space amplification—along with performance volatility and resource contention, exacerbated in multi-tenant distributed environments. This paper presents a systematic survey of LSM-tree optimization research from 2019 to 2024, employing bibliometric analysis and technical taxonomy to categorize advances in tiered compaction, cache-aware coordination, read/write path separation, workload-aware scheduling, and distributed resource management. Distinct from prior surveys, we uniquely analyze the co-design challenges arising from multi-tier storage hierarchies, heterogeneous workloads, and cross-tenant resource contention; rigorously characterize the Pareto boundaries among the three amplification effects; and propose future directions targeting high throughput, low latency, and strong tenant isolation. Our synthesis provides both theoretical foundations and practical engineering guidance for next-generation high-performance key-value stores. (149 words)

Technology Category

Application Category

📝 Abstract
LSM-tree is a widely adopted data structure in modern key-value store systems that optimizes write performance in write-heavy applications by using append writes to achieve sequential writes. However, the unpredictability of LSM-tree compaction introduces significant challenges, including performance variability during peak workloads and in resource-constrained environments, write amplification caused by data rewriting during compactions, read amplification from multi-level queries, trade-off between read and write performance, as well as efficient space utilization to mitigate space amplification. Prior studies on LSM-tree optimizations have addressed the above challenges; however, in recent years, research on LSM-tree optimization has continued to propose. The goal of this survey is to review LSM-tree optimization, focusing on representative works in the past five years. This survey first studies existing solutions on how to mitigate the performance impact of LSM-tree flush and compaction and how to improve basic key-value operations. In addition, distributed key-value stores serve multi-tenants, ranging from tens of thousands to millions of users with diverse requirements. We then analyze the new challenges and opportunities in these modern architectures and across various application scenarios. Unlike the existing survey papers, this survey provides a detailed discussion of the state-of-the-art work on LSM-tree optimizations and gives future research directions.
Problem

Research questions and friction points this paper is trying to address.

Address performance variability during peak workloads
Reduce write and read amplification in LSM-trees
Optimize space utilization in key-value stores
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizing LSM-tree flush and compaction performance
Improving basic key-value operations efficiency
Addressing distributed multi-tenant architecture challenges
🔎 Similar Papers
No similar papers found.
Yina Lv
Yina Lv
Assistant Professor, Xiamen University
Storage Systems
Q
Qiao Li
Department of Computer Science, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Quanqing Xu
Quanqing Xu
Ant Group
Cloud ComputingCloud StorageLarge-scale Hybrid Storage Systems
Congming Gao
Congming Gao
Xiamen University
Flash MemorySSDComputer Architecture
C
Chuanhui Yang
OceanBase, Ant Group, China
X
Xiaoli Wang
School of Informatics, Xiamen University, China
Chun Jason Xue
Chun Jason Xue
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Systems and Storage