Efficient Graph-Based Approximate Nearest Neighbor Search Achieving: Low Latency Without Throughput Loss

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

High-dimensional neural embeddings enhance semantic search accuracy but exacerbate the latency-throughput trade-off in approximate nearest neighbor search (ANNS), especially in online services. Conventional graph-based ANNS methods suffer from synchronization overhead and redundant vertex accesses inherent in the fork-join parallelization model, hindering simultaneous low-latency and high-throughput execution. To address this, we propose the first fully asynchronous graph traversal architecture for ANNS, integrating dynamic dependency-free load balancing, lock-free concurrency control, and incremental neighbor expansion—thereby eliminating synchronization bottlenecks and duplicate vertex visits. Evaluated across multiple benchmark datasets, our method achieves 1.5–1.9× lower latency than state-of-the-art approaches and delivers 2.1–8.9× higher throughput at equivalent latency levels, significantly advancing real-time performance boundaries for high-dimensional ANNS.

Technology Category

Application Category

📝 Abstract

The increase in the dimensionality of neural embedding models has enhanced the accuracy of semantic search capabilities but also amplified the computational demands for Approximate Nearest Neighbor Searches (ANNS). This complexity poses significant challenges in online and interactive services, where query latency is a critical performance metric. Traditional graph-based ANNS methods, while effective for managing large datasets, often experience substantial throughput reductions when scaled for intra-query parallelism to minimize latency. This reduction is largely due to inherent inefficiencies in the conventional fork-join parallelism model. To address this problem, we introduce AverSearch, a novel parallel graph-based ANNS framework that overcomes these limitations through a fully asynchronous architecture. Unlike existing frameworks that struggle with balancing latency and throughput, AverSearch utilizes a dynamic workload balancing mechanism that supports continuous, dependency-free processing. This approach not only minimizes latency by eliminating unnecessary synchronization and redundant vertex processing but also maintains high throughput levels. Our evaluations across various datasets, including both traditional benchmarks and modern large-scale model generated datasets, show that AverSearch consistently outperforms current state-of-the-art systems. It achieves up to 2.1-8.9 times higher throughput at comparable latency levels across different datasets and reduces minimum latency by 1.5 to 1.9 times.

Problem

Research questions and friction points this paper is trying to address.

Reducing latency in graph-based ANNS without throughput loss

Overcoming inefficiencies in traditional fork-join parallelism models

Balancing latency and throughput in high-dimensional embedding searches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully asynchronous architecture for ANNS

Dynamic workload balancing mechanism

Eliminates synchronization and redundant processing

🔎 Similar Papers

Dimensionality-Reduction Techniques for Approximate Nearest Neighbor Search: A Survey and Evaluation