Proximity Graphs for Similarity Search: Fast Construction, Lower Bounds, and Euclidean Separation

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the theoretical limits and efficient construction of proximity graphs for approximate nearest neighbor (ANN) search. For general metric spaces, we establish the first tight lower bound on the number of edges in ANN graphs, exposing inherent bottlenecks for non-geometric inputs. In Euclidean space, we overcome this barrier by leveraging geometric separability and doubling dimension analysis to construct low-outdegree (1+ε)-ANN graphs. Our algorithm achieves near-linear graph construction time O(n log Δ), edge count O((1/ε)^λ · n), and query time (1/ε)^λ · polylog Δ—substantially improving over traditional Ω(n²) approaches. Key contributions are: (i) the first universal lower bound on edge complexity; (ii) a subquadratic, edge-optimal ANN graph construction for Euclidean space; and (iii) a unified framework proving theoretical optimality while delivering strong practical efficiency.

Technology Category

Application Category

📝 Abstract
Proximity graph-based methods have emerged as a leading paradigm for approximate nearest neighbor (ANN) search in the system community. This paper presents fresh insights into the theoretical foundation of these methods. We describe an algorithm to build a proximity graph for $(1+ε)$-ANN search that has $O((1/ε)^λcdot n log Δ)$ edges and guarantees $(1/ε)^λcdot ext{polylog }Δ$ query time. Here, $n$ and $Δ$ are the size and aspect ratio of the data input, respectively, and $λ= O(1)$ is the doubling dimension of the underlying metric space. Our construction time is near-linear to $n$, improving the $Ω(n^2)$ bounds of all previous constructions. We complement our algorithm with lower bounds revealing an inherent limitation of proximity graphs: the number of edges needs to be at least $Ω((1/ε)^λcdot n + n log Δ)$ in the worst case, up to a subpolynomial factor. The hard inputs used in our lower-bound arguments are non-geometric, thus prompting the question of whether improvement is possible in the Euclidean space (a key subclass of metric spaces). We provide an affirmative answer by using geometry to reduce the graph size to $O((1/ε)^λcdot n)$ while preserving nearly the same query and construction time.
Problem

Research questions and friction points this paper is trying to address.

Efficiently constructing proximity graphs for approximate nearest neighbor search
Establishing lower bounds on graph edges for worst-case scenarios
Reducing graph size using Euclidean space geometric properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast near-linear construction algorithm for proximity graphs
Lower bounds reveal inherent edge number limitations
Geometry reduces graph size while preserving performance
🔎 Similar Papers
No similar papers found.