Efficient Graph-Based Approximate Nearest Neighbor Search Achieving: Low Latency Without Throughput Loss

๐Ÿ“… 2025-04-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
High-dimensional neural embeddings enhance semantic search accuracy but exacerbate the latency-throughput trade-off in approximate nearest neighbor search (ANNS), especially in online services. Conventional graph-based ANNS methods suffer from synchronization overhead and redundant vertex accesses inherent in the fork-join parallelization model, hindering simultaneous low-latency and high-throughput execution. To address this, we propose the first fully asynchronous graph traversal architecture for ANNS, integrating dynamic dependency-free load balancing, lock-free concurrency control, and incremental neighbor expansionโ€”thereby eliminating synchronization bottlenecks and duplicate vertex visits. Evaluated across multiple benchmark datasets, our method achieves 1.5โ€“1.9ร— lower latency than state-of-the-art approaches and delivers 2.1โ€“8.9ร— higher throughput at equivalent latency levels, significantly advancing real-time performance boundaries for high-dimensional ANNS.

Technology Category

Application Category

๐Ÿ“ Abstract
The increase in the dimensionality of neural embedding models has enhanced the accuracy of semantic search capabilities but also amplified the computational demands for Approximate Nearest Neighbor Searches (ANNS). This complexity poses significant challenges in online and interactive services, where query latency is a critical performance metric. Traditional graph-based ANNS methods, while effective for managing large datasets, often experience substantial throughput reductions when scaled for intra-query parallelism to minimize latency. This reduction is largely due to inherent inefficiencies in the conventional fork-join parallelism model. To address this problem, we introduce AverSearch, a novel parallel graph-based ANNS framework that overcomes these limitations through a fully asynchronous architecture. Unlike existing frameworks that struggle with balancing latency and throughput, AverSearch utilizes a dynamic workload balancing mechanism that supports continuous, dependency-free processing. This approach not only minimizes latency by eliminating unnecessary synchronization and redundant vertex processing but also maintains high throughput levels. Our evaluations across various datasets, including both traditional benchmarks and modern large-scale model generated datasets, show that AverSearch consistently outperforms current state-of-the-art systems. It achieves up to 2.1-8.9 times higher throughput at comparable latency levels across different datasets and reduces minimum latency by 1.5 to 1.9 times.
Problem

Research questions and friction points this paper is trying to address.

Reducing latency in graph-based ANNS without throughput loss
Overcoming inefficiencies in traditional fork-join parallelism models
Balancing latency and throughput in high-dimensional embedding searches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully asynchronous architecture for ANNS
Dynamic workload balancing mechanism
Eliminates synchronization and redundant processing
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jingjia Luo
Tsinghua University
M
Mingxing Zhang
Tsinghua University
K
Kang Chen
Tsinghua University
X
Xia Liao
Tsinghua University
Y
Yingdi Shan
Tsinghua University
Jinlei Jiang
Jinlei Jiang
Department of Computer Science and Technology, Tsinghua University
Cloud ComputingBig DataGrid ComputingCSCW
Y
Yongwei Wu
Tsinghua University