๐ค AI Summary
High-dimensional neural embeddings enhance semantic search accuracy but exacerbate the latency-throughput trade-off in approximate nearest neighbor search (ANNS), especially in online services. Conventional graph-based ANNS methods suffer from synchronization overhead and redundant vertex accesses inherent in the fork-join parallelization model, hindering simultaneous low-latency and high-throughput execution. To address this, we propose the first fully asynchronous graph traversal architecture for ANNS, integrating dynamic dependency-free load balancing, lock-free concurrency control, and incremental neighbor expansionโthereby eliminating synchronization bottlenecks and duplicate vertex visits. Evaluated across multiple benchmark datasets, our method achieves 1.5โ1.9ร lower latency than state-of-the-art approaches and delivers 2.1โ8.9ร higher throughput at equivalent latency levels, significantly advancing real-time performance boundaries for high-dimensional ANNS.
๐ Abstract
The increase in the dimensionality of neural embedding models has enhanced the accuracy of semantic search capabilities but also amplified the computational demands for Approximate Nearest Neighbor Searches (ANNS). This complexity poses significant challenges in online and interactive services, where query latency is a critical performance metric. Traditional graph-based ANNS methods, while effective for managing large datasets, often experience substantial throughput reductions when scaled for intra-query parallelism to minimize latency. This reduction is largely due to inherent inefficiencies in the conventional fork-join parallelism model. To address this problem, we introduce AverSearch, a novel parallel graph-based ANNS framework that overcomes these limitations through a fully asynchronous architecture. Unlike existing frameworks that struggle with balancing latency and throughput, AverSearch utilizes a dynamic workload balancing mechanism that supports continuous, dependency-free processing. This approach not only minimizes latency by eliminating unnecessary synchronization and redundant vertex processing but also maintains high throughput levels. Our evaluations across various datasets, including both traditional benchmarks and modern large-scale model generated datasets, show that AverSearch consistently outperforms current state-of-the-art systems. It achieves up to 2.1-8.9 times higher throughput at comparable latency levels across different datasets and reduces minimum latency by 1.5 to 1.9 times.