BMTree: Designing, Learning, and Updating Piecewise Space-Filling Curves for Multi-Dimensional Data Indexing

📅 2025-05-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing space-filling curves (SFCs) employ globally uniform mappings, failing to adapt to distribution heterogeneity across multidimensional subspaces—degrading indexing performance. This paper proposes BMTree, the first dynamic learned index supporting piecewise SFCs, which jointly optimizes subspace-specific SFC generation and tree-structure construction via deep reinforcement learning. We design a streaming distribution-shift detection mechanism and enable local model incremental updates, eliminating costly full rebuilds. Compared with classical SFCs (e.g., Hilbert, Z-order) and state-of-the-art learned indexes, BMTree achieves significant improvements in range and k-nearest-neighbor query performance, reduces training overhead by over 60%, and cuts dynamic update latency by 85%. BMTree is the first to realize fine-grained adaptivity and efficient online evolution of SFC mappings.

Technology Category

Application Category

📝 Abstract
Space-filling curves (SFC, for short) have been widely applied to index multi-dimensional data, which first maps the data to one dimension, and then a one-dimensional indexing method, e.g., the B-tree indexes the mapped data. Existing SFCs adopt a single mapping scheme for the whole data space. However, a single mapping scheme often does not perform well on all the data space. In this paper, we propose a new type of SFC called piecewise SFCs that adopts different mapping schemes for different data subspaces. Specifically, we propose a data structure termed the Bit Merging tree (BMTree) that can generate data subspaces and their SFCs simultaneously, and achieve desirable properties of the SFC for the whole data space. Furthermore, we develop a reinforcement learning-based solution to build the BMTree, aiming to achieve excellent query performance. To update the BMTree efficiently when the distributions of data and/or queries change, we develop a new mechanism that achieves fast detection of distribution shifts in data and queries, and enables partial retraining of the BMTree. The retraining mechanism achieves performance enhancement efficiently since it avoids retraining the BMTree from scratch. Extensive experiments show the effectiveness and efficiency of the BMTree with the proposed learning-based methods.
Problem

Research questions and friction points this paper is trying to address.

Improving multi-dimensional data indexing with piecewise space-filling curves
Designing adaptive mapping schemes for different data subspaces
Efficiently updating indexes for dynamic data and query distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Piecewise SFCs with varied mapping schemes
BMTree for simultaneous subspace and SFC generation
Reinforcement learning for efficient BMTree construction
🔎 Similar Papers
No similar papers found.
J
Jiangneng Li
Nanyang Technological University, Singapore, 639798
Y
Yuang Liu
Nanyang Technological University, Singapore, 639798
Z
Zheng Wang
Nanyang Technological University, Singapore, 639798
Gao Cong
Gao Cong
Nanyang Technological University
Data ManagementDatabasesData MiningSpatial Databases
Cheng Long
Cheng Long
Nanyang Technological University
databasesmachine learningdata mining
W
W. Aref
Purdue University, West Lafayette, IN, USA, 47907
Han Mao Kiah
Han Mao Kiah
School of Physical and Mathematical Sciences, Nanyang Technological University
Coding TheoryCombinatorics
B
Bin Cui
School of Computer Science, Peking University, Beijing, China, 100871