🤖 AI Summary
LSM-trees inherently struggle to simultaneously optimize read performance, write throughput, and space efficiency due to the fundamental trade-offs between vertical (level-increasing) and horizontal (size-expanding) growth paradigms.
Method: This paper proposes Vertiorizon, the first adaptive hybrid growth strategy that unifies both paradigms. Grounded in generalized Bentley–Saxe dynamic array theory, it formulates a three-objective co-optimization model enabling workload-driven, real-time structural adaptation. The design integrates asymptotic cost modeling with deep kernel-level modifications in RocksDB.
Contribution/Results: Vertiorizon achieves significant improvements over baseline approaches: substantial write throughput gains versus pure vertical schemes; space amplification reduced to one-sixth of that under pure horizontal schemes; and a markedly expanded feasible region in the read–write trade-off space. This work establishes the first theoretically sound and practically deployable hybrid growth framework for dynamic LSM-tree structure design.
📝 Abstract
LSM-tree based key-value stores are widely adopted as the data storage backend in modern big data applications. The LSM-tree grows with data ingestion, by either adding levels with fixed level capacities (dubbed as vertical scheme) or increasing level capacities with fixed number of levels (dubbed as horizontal scheme). The vertical scheme leads the trend in recent system designs in RocksDB, LevelDB, and WiredTiger, whereas the horizontal scheme shows a decline in being adopted in the industry. The growth scheme profoundly impacts the LSM system performance in various aspects such as read, write and space costs. This paper attempts to give a new insight into a fundamental design question -- how to grow an LSM-tree to attain more desirable performance? Our analysis highlights the limitations of the vertical scheme in achieving an optimal read-write trade-off and the horizontal scheme in managing space cost effectively. Building on the analysis, we present a novel approach, Vertiorizon, which combines the strengths of both the vertical and horizontal schemes to achieve a superior balance between lookup, update, and space costs. Its adaptive design makes it highly compatible with a wide spectrum of workloads. Compared to the vertical scheme, Vertiorizon significantly improves the read-write performance trade-off. In contrast to the horizontal scheme, Vertiorizon greatly extends the trade-off range by a non-trivial generalization of Bentley and Saxe's theory, while substantially reducing space costs. When integrated with RocksDB, Vertiorizon demonstrates better write performance than the vertical scheme, while incurring about six times less additional space cost compared to the horizontal scheme.