๐ค AI Summary
This work addresses the dependency of approximate nearest neighbor (ANN) query time on the datasetโs spreadโa limitation that often leads to degraded efficiency in high-dimensional or unevenly distributed data. To overcome this, the authors propose a novel approach that integrates proximity graphs with an external linear-size data structure. The method maintains O(n) space complexity while reducing query time complexity from being spread-dependent to O(log n), thereby achieving the first ANN query algorithm whose performance depends solely on the number of points n and scales logarithmically with it. Theoretical analysis and design demonstrate that the proposed solution effectively decouples query efficiency from intrinsic data distribution characteristics, significantly enhancing scalability and practicality in high-dimensional settings.
๐ Abstract
$\renewcommand{\Re}{\mathbb{R}}$Recent work showed how to construct nearest-neighbor graphs of linear size, on a given set $P$ of $n$ points in $\Re^d$, such that one can answer approximate nearest-neighbor queries in logarithmic time in the spread. Unfortunately, the spread might be unbounded in $n$, and an interesting theoretical question is how to remove the dependency on the spread. Here, we show how to construct an external linear-size data structure that, combined with the linear-size graph, allows us to answer ANN queries in logarithmic time in $n$.