Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Approximate nearest neighbor (ANN) search over billion-scale high-dimensional vectors using graph-based indexing remains computationally challenging. Method: We conduct a systematic benchmark of 12 state-of-the-art graph indexing algorithms across seven real-world datasets, categorizing them for the first time into five design paradigms: seed selection, incremental insertion, neighborhood propagation, neighborhood diversification, and divide-and-conquer. We perform comprehensive multi-dimensional evaluation and ablation studies, employing memory-efficient construction and traversal to achieve millisecond-scale query latency. Contribution/Results: Our analysis identifies “incremental insertion + neighborhood diversification” as the optimal strategy and demonstrates that the underlying graph structure fundamentally governs scalability. Among all methods, NSG, HNSW, and Vamana exhibit the strongest robustness. We further propose data-adaptive seed selection and diversification mechanisms, yielding key theoretical insights and practical guidelines for principled graph index design.

Technology Category

Application Category

📝 Abstract
Vector data is prevalent across business and scientific applications, and its popularity is growing with the proliferation of learned embeddings. Vector data collections often reach billions of vectors with thousands of dimensions, thus, increasing the complexity of their analysis. Vector search is the backbone of many critical analytical tasks, and graph-based methods have become the best choice for analytical tasks that do not require guarantees on the quality of the answers. We briefly survey in-memory graph-based vector search, outline the chronology of the different methods and classify them according to five main design paradigms: seed selection, incremental insertion, neighborhood propagation, neighborhood diversification, and divide-and-conquer. We conduct an exhaustive experimental evaluation of twelve state-of-the-art methods on seven real data collections, with sizes up to 1 billion vectors. We share key insights about the strengths and limitations of these methods; e.g., the best approaches are typically based on incremental insertion and neighborhood diversification, and the choice of the base graph can hurt scalability. Finally, we discuss open research directions, such as the importance of devising more sophisticated data-adaptive seed selection and diversification strategies.
Problem

Research questions and friction points this paper is trying to address.

Evaluate graph-based vector search methods
Analyze scalability of billion-scale vector data
Identify optimal strategies for vector search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based vector search methods
Incremental insertion techniques
Neighborhood diversification strategies