Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
The construction cost of proximity graph (PG)-based k-approximate nearest neighbor (k-ANN) indexes grows superlinearly with data dimensionality, severely limiting scalability for large-scale high-dimensional datasets. Method: This paper proposes a general graph construction acceleration framework. Its core innovation lies in the first systematic analysis of edge generation mechanisms in the Relative Neighborhood Graph (RNG) and Navigable Small World Graph (NSWG), enabling a theoretically grounded edge-selection pruning strategy that preserves k-ANN retrieval accuracy while substantially reducing graph construction complexity. Furthermore, it achieves efficient transfer and co-optimization of RNG acceleration principles to NSWG. Results: Extensive experiments on multiple benchmark datasets demonstrate up to 5.6× speedup in index construction with zero accuracy loss in k-ANN retrieval, significantly enhancing the scalability and practicality of PG-based methods for big-data applications.

Technology Category

Application Category

📝 Abstract
Proximity graphs (PG) have gained increasing popularity as the state-of-the-art solutions to $k$-approximate nearest neighbor ($k$-ANN) search on high-dimensional data, which serves as a fundamental function in various fields, e.g., retrieval-augmented generation. Although PG-based approaches have the best $k$-ANN search performance, their index construction cost is superlinear to the number of points. Such superlinear cost substantially limits their scalability in the era of big data. Hence, the goal of this paper is to accelerate the construction of PG-based methods without compromising their $k$-ANN search performance. To achieve this goal, two mainstream categories of PG are revisited: relative neighborhood graph (RNG) and navigable small world graph (NSWG). By revisiting their construction process, we find the issues of construction efficiency. To address these issues, we propose a new construction framework with a novel pruning strategy for edge selection, which accelerates RNG construction while keeping its $k$-ANN search performance. Then, we integrate this framework into NSWG construction to enhance both the construction efficiency and $k$-ANN search performance of NSWG. Extensive experiments are conducted to validate our construction framework for both RNG and NSWG, and that it significantly reduces the PG construction cost, achieving up to 5.6x speedup, while not compromising the $k$-ANN search performance.
Problem

Research questions and friction points this paper is trying to address.

Accelerating proximity graph index construction
Reducing superlinear construction cost
Maintaining k-ANN search performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pruning strategy accelerates RNG construction
Framework enhances NSWG construction efficiency
Achieves 5.6x speedup without compromising k-ANN performance
S
Shuo Yang
Xidian University
J
Jiadong Xie
The Chinese University of Hong Kong
Yingfan Liu
Yingfan Liu
Xidian University
Vector DatabaseHigh-performance Computations
Jeffrey Xu Yu
Jeffrey Xu Yu
Chinese University of Hong Kong
DatabaseData Mining
X
Xiyue Gao
Xidian University
Qianru Wang
Qianru Wang
Xidian University
Urban ComputingInternet of Things
Y
Yanguo Peng
Xidian University
J
Jiangtao Cui
Xidian University