BLI: A High-performance Bucket-based Learned Index with Concurrency Support

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional tree-based indexes suffer from high concurrency insertion overhead, difficulty supporting lock-free parallelism, and imbalanced lookup/update efficiency. To address these issues, this paper proposes Bucket-based Learned Index (BLI), a high-performance in-memory learned index. Its core innovation is a “globally ordered, locally unordered” bucketed structure: sortable buckets replace strictly sorted arrays, decoupling global ordering maintenance from intra-bucket organization. BLI introduces an adaptive lightweight maintenance mechanism supporting bulk loading, dynamic bucket splitting/merging, and on-demand retraining. Notably, it is the first learned index to achieve fully lock-free multi-threaded concurrency. Experimental results demonstrate that BLI achieves 2.21× higher throughput than state-of-the-art learned indexes under single-threaded workloads and 3.91× higher under multi-threaded workloads, while significantly reducing insertion latency and memory footprint.

Technology Category

Application Category

📝 Abstract
Learned indexes are promising to replace traditional tree-based indexes. They typically employ machine learning models to efficiently predict target positions in strictly sorted linear arrays. However, the strict sorted order 1) significantly increases insertion overhead, 2) makes it challenging to support lock-free concurrency, and 3) harms in-node lookup/insertion efficiency due to model inaccuracy. In this paper, we introduce a extit{Bucket-based Learned Index (BLI)}, which is an updatable in-memory learned index that adopts a"globally sorted, locally unsorted"approach by replacing linear sorted arrays with extit{Buckets}. BLI optimizes the insertion throughput by only sorting Buckets, not the key-value pairs within a Bucket. BLI strategically balances three critical performance metrics: tree fanouts, lookup/insert latency for inner nodes, lookup/insert latency for leaf nodes, and memory consumption. To minimize maintenance costs, BLI performs lightweight bulk loading, insert, node scaling, node split, model retraining, and node merging adaptively. BLI supports lock-free concurrency thanks to the unsorted design with Buckets. Our results show that BLI achieves up to 2.21x better throughput than state-of-the-art learned indexes, with up to 3.91x gains under multi-threaded conditions.
Problem

Research questions and friction points this paper is trying to address.

Improves insertion overhead in learned indexes
Supports lock-free concurrency with unsorted Buckets
Optimizes lookup/insert latency and memory consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bucket-based Learned Index (BLI)
Globally sorted, locally unsorted
Lock-free concurrency support
🔎 Similar Papers
No similar papers found.