DISTRIBUTEDANN: Efficient Scaling of a Single DISKANN Graph Across Thousands of Computers

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address the challenge of efficiently deploying a 50-billion-vector disk-based ANN index across thousands of nodes, this paper proposes a novel cross-node scalable architecture that abandons the conventional sharding-and-routing paradigm. Instead, it enables seamless distribution of a single DISKANN graph across hundreds to thousands of servers. The design integrates a distributed key-value store with in-memory approximate nearest neighbor indexing, jointly optimizing data placement, query routing, and load balancing. Evaluated in production, it achieves a median query latency of 26 ms and throughput exceeding 100,000 QPS. Compared to Bing’s prior horizontal scaling approach, this solution delivers a 6× performance improvement—marking the first demonstration of low-latency, high-throughput distributed retrieval over a single, ultra-large-scale graph index. The work establishes a new, scalable paradigm for industrial-grade vector search engines.

Technology Category

Application Category

📝 Abstract

We present DISTRIBUTEDANN, a distributed vector search service that makes it possible to search over a single 50 billion vector graph index spread across over a thousand machines that offers 26ms median query latency and processes over 100,000 queries per second. This is 6x more efficient than existing partitioning and routing strategies that route the vector query to a subset of partitions in a scale out vector search system. DISTRIBUTEDANN is built using two well-understood components: a distributed key-value store and an in-memory ANN index. DISTRIBUTEDANN has replaced conventional scale-out architectures for serving the Bing search engine, and we share our experience from making this transition.

Problem

Research questions and friction points this paper is trying to address.

Scaling single large vector graph index across thousands of machines

Achieving low latency search for billion-scale vector datasets

Improving efficiency over existing distributed vector search systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed vector search across thousands of machines

Single 50 billion vector graph index scaling

Combines distributed key-value store with ANN index

🔎 Similar Papers

No similar papers found.