Don't Forget Range Delete! Enhancing LSM-based Key-Value Stores with More Compatible Lookups and Deletes

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Range deletions in LSM-trees introduce range tombstones that improve deletion efficiency but increase point-query latency by up to 30%—a long-overlooked trade-off. This paper proposes GLORAN, the first system to jointly optimize range deletion and point-query performance. Its core innovations include a lightweight global index and an entry validity estimator, synergistically combining Bloom filters with a logarithmic-level indexing structure to significantly reduce I/O interference from range tombstones during point queries. GLORAN integrates seamlessly into modern LSM-based systems without modifying the underlying storage format. Experimental results show that, while maintaining high-efficiency range deletions, GLORAN accelerates point queries by up to 10.6× over state-of-the-art approaches and improves end-to-end throughput by up to 2.7×.

Technology Category

Application Category

📝 Abstract
LSM-trees are featured by out-of-place updates, where key deletion is handled by inserting a tombstone to mark its staleness instead of removing it in place. This defers actual removal to compactions with greatly reduced overhead. However, this classic strategy struggles with another fundamental operator--range deletes--which removes all keys within a specified range, requiring the system to insert numerous tombstones and causing severe performance issues. To address this, modern LSM-based systems introduce range tombstones that record the start and end keys to avoid per-key tombstones. Although this achieves impressive range delete efficiency, such a solution is incompatible with lookups. In particular, our experiments show that point lookup latency can increase by 30% even with just 1% range deletions in workloads. Further to our surprise, this issue has not been raised before, though the range tombstone solution has been employed for more than five years. To address this critical performance issue, we propose GLORAN, an efficient range delete method that can be integrated into modern LSM-based systems and offers desirable range deletion performance without compromising point lookup efficiency. It introduces a global index that allows point lookups to quickly locate relevant ranges without retrieving many irrelevant elements, reducing the I/O complexity from O(N/lambda) to either O(log^2 N/(lambda F)) or O(phi log N/F), where 1/lambda is the ratio of range deletes, and phi is the FPR of Bloom filters in LSM-trees. Furthermore, we design an entry validity estimator to further enhance expected I/O cost to O(epsilon log^2 N/(lambda F)) for looking up existing keys. Extensive evaluations indicate that GLORAN consistently outperforms baselines, while achieving up to 10.6 times faster point lookups and 2.7 times higher overall throughput compared to the SOTA method.
Problem

Research questions and friction points this paper is trying to address.

Range deletes in LSM-trees cause severe performance issues with numerous tombstones
Existing range tombstone solutions are incompatible with efficient point lookups
Current systems lack efficient range deletion without compromising lookup performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

GLORAN introduces global index for efficient range deletes
Entry validity estimator reduces I/O complexity significantly
GLORAN maintains point lookup efficiency while handling deletions
🔎 Similar Papers
No similar papers found.