A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
Existing learned indexes struggle to simultaneously achieve high concurrency, durability, and low intrusiveness under write-intensive workloads. This work proposes a hierarchical learned indexing architecture that leverages the separation between Memtables and SST files in RocksDB to enable targeted optimizations at both memory and disk layers. By reusing structural knowledge across Memtables, the approach mitigates the overhead of frequent index reconstruction, while a block-aware, read-only learned index ensures that lookups complete within a single I/O in the worst case—without requiring modifications to the storage layer or read path. Experimental results demonstrate that, across diverse large-scale workloads, the proposed method improves write throughput by up to 1.5× and read throughput by up to 2.1× compared to state-of-the-art systems.
📝 Abstract
Learned indexes have emerged as a promising alternative to traditional index structures, offering higher throughput and lower memory usage by approximating the cumulative key distribution function with lightweight models. Despite these benefits, adoption in production systems remains limited, partly because learned indexes that support concurrency and persistence as effectively as, e.g., the B+-Tree, do not yet exist, while many research prototypes introduce substantial complexity. In this paper, we investigate whether off-the-shelf learned indexes can be integrated into a production database with minimal storage-engine redesign. Using RocksDB as a case study, we exploit its separation between in-memory Memtables and immutable on-disk files to deploy specialized indexes at each level. We show that directly applying existing learned indexes is insufficient under write-heavy workloads because frequent Memtable replacement prevents models from fully adapting. To address this, we introduce a reuse mechanism that preserves structural knowledge across Memtable instances. At the storage level, we replace RocksDB's disk index with a learned index without modifying the storage layer or read path. We further adapt a read-only learned index to be block-aware, enabling worst-case single-I/O lookups. We implement these techniques in MountDB, an extension of RocksDB. Experiments on large-scale workloads with diverse data distributions and access patterns show up to 1.5X higher write throughput and 2.1X higher read throughput than state-of-the-art systems, demonstrating that established learned indexes can be integrated into production systems with minimal overhead and substantial performance benefits.
Problem

Research questions and friction points this paper is trying to address.

learned indexing
RocksDB
production database integration
write-heavy workloads
minimal system modification
Innovation

Methods, ideas, or system contributions that make the work stand out.

learned indexing
RocksDB
Memtable reuse
block-aware index
minimal system modification
🔎 Similar Papers
No similar papers found.