GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions

πŸ“… 2026-02-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the lack of systematic optimization and end-to-end performance evaluation of graph-based vector search algorithms on modern GPU architectures. It presents the first comprehensive taxonomy framework for GPU-accelerated graph-structured approximate nearest neighbor search (ANNS) algorithms, offering in-depth analysis of the mapping between algorithmic tasks and GPU hardware resources. Through implementations of six representative algorithms across eight large-scale datasets and fine-grained performance profiling, the study identifies distance computation and CPU–GPU data transfer as critical performance bottlenecks and elucidates the trade-offs between scalability and memory usage. The paper further provides practical design guidelines for deploying GPU-based ANNS systems and releases a comprehensive open-source benchmark to support future research in the community.

Technology Category

Application Category

πŸ“ Abstract
Approximate Nearest Neighbor Search (ANNS) underpins many large-scale data mining and machine learning applications, with efficient retrieval increasingly hinging on GPU acceleration as dataset sizes grow. Although graph-based approaches represent the state of the art in approximate nearest neighbor search, there is a lack of systematic understanding regarding their optimization for modern GPU architectures and their end-to-end effectiveness in practical scenarios. In this work, we present a comprehensive survey and experimental study of GPU-accelerated graph-based vector search algorithms. We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units within GPUs. Through a thorough evaluation of six leading algorithms on eight large-scale benchmark datasets, we assess both graph index construction and query search performance. Our analysis reveals that distance computation remains the primary computational bottleneck, while data transfer between the host CPU and GPU emerges as the dominant factor influencing real-world latency at large scale. We also highlight key trade-offs in scalability and memory usage across different system designs. Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems, and provide a comprehensive benchmark for the knowledge discovery and data mining community.
Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor Search
Graph-based Algorithms
GPU Acceleration
Vector Search
Scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration
graph-based ANNS
taxonomy of optimization
distance computation bottleneck
CPU-GPU data transfer
πŸ”Ž Similar Papers
No similar papers found.