🤖 AI Summary
This work addresses the high storage and construction overhead of existing methods for range-filtered approximate nearest neighbor (RFANN) queries, which typically require multiple range-specific graph indices. The authors propose RNSG, a novel graph index grounded in the newly formalized theory of range-aware relative neighborhood graphs (RRNGs). They establish that RRNGs possess monotonic searchability and structural heredity, enabling a single graph structure to efficiently support arbitrary range queries. By integrating range-aware graph construction, a beam-search query strategy, and an efficient approximate construction algorithm, RNSG significantly outperforms state-of-the-art approaches across five real-world datasets, achieving superior query performance while substantially reducing index size and construction cost.
📝 Abstract
Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.