🤖 AI Summary
This work addresses the challenge of efficiently merging partitioned proximity graph indexes in large-scale, high-dimensional vector scenarios, where memory constraints necessitate block-wise index construction yet direct search across multiple sub-indexes suffers from poor efficiency and lacks effective fusion mechanisms. To overcome this, the authors propose a Reverse Neighbor Sliding Merge (RNSM) strategy that leverages graph structural information to enable efficient index integration, along with a Merge Order Optimization (MOS) method to minimize redundant operations. Experimental results demonstrate that the proposed approach achieves up to 5.48× speedup over existing merging techniques and is up to 9.92× faster than full index reconstruction, while maintaining highly efficient and stable approximate nearest neighbor search performance even at billion-scale vectors with 50 partitions.
📝 Abstract
Approximate $k$ nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art search efficiency across various benchmarks. However, their extensive distance computations of high-dimensional vectors lead to slow construction and substantial memory overhead. The limited memory capacity often prevents building the entire index at once when handling large-scale datasets. A common practice is to build multiple sub-indexes separately. However, directly searching on these separated indexes severely compromises search efficiency, as queries cannot leverage cross-graph connections. Therefore, efficient graph index merging is crucial for multi-index searching. In this paper, we focus on efficient two-index merging and the merge order of multiple indexes for AKNN search. To achieve this, we propose a reverse neighbor sliding merge (RNSM) that exploits structural information to boost merging efficiency. We further investigate merge order selection (MOS) to reduce the merging cost by eliminating redundant merge operations. Experiments show that our approach yields up to a 5.48$\times$ speedup over existing index merge methods and 9.92$\times$ speedup over index reconstruction, while maintaining expected superior search performance. Moreover, our method scales efficiently to 100 million vectors with 50 partitions, maintaining consistent speedups.