🤖 AI Summary
This work addresses the challenge of efficiently supporting multi-attribute numeric range constraints in high-dimensional nearest neighbor search. To this end, the authors propose KHI, a novel index structure that, for the first time, enables effective support for range-filtered k-nearest neighbor queries with multiple attributes. KHI integrates a hierarchical partitioning tree over the attribute space with Hierarchical Navigable Small World (HNSW) graphs embedded within each node. It employs a skew-aware splitting strategy to control tree height and performs greedy HNSW search along tree paths. Experimental evaluation on four real-world datasets demonstrates that KHI achieves an average throughput improvement of 2.46× over the state-of-the-art method, with gains reaching up to 16.22×, particularly excelling under low selectivity, large k values, and high predicate cardinality scenarios.
📝 Abstract
Nearest neighbor search on high-dimensional vectors is fundamental in modern AI and database systems. In many real-world applications, queries involve constraints on multiple numeric attributes, giving rise to range-filtering approximate nearest neighbor search (RFANNS). While there exist RFANNS indexes for single-attribute range predicates, extending them to the multi-attribute setting is nontrivial and often ineffective. In this paper, we propose KHI, an index for multi-attribute RFANNS that combines an attribute-space partitioning tree with HNSW graphs attached to tree nodes. A skew-aware splitting rule bounds the tree height by $O(\log n)$, and queries are answered by routing through the tree and running greedy search on the HNSW graphs. Experiments on four real-world datasets show that KHI consistently achieves high query throughput while maintaining high recall. Compared with the state-of-the-art RFANNS baseline, KHI improves QPS by $2.46\times$ on average and up to $16.22\times$ on the hard dataset, with larger gains for smaller selectivity, larger $k$, and higher predicate cardinality.