π€ AI Summary
This paper studies average-distortion sketching in metric spaces: designing compact sketches of points that simultaneously (i) never underestimate pairwise distances under any fixed distribution, and (ii) approximate their expected distance within a multiplicative factor. We formally introduce this notionβbreaking the inherent lower bounds of worst-case sketching. Our method combines randomized ββ-projection, a data-dependent variant of locality-sensitive hashing (LSH), and probabilistic far-point certificates. For the metric space ([Ξ]α΅, ββ), it achieves c-approximation for any constant c > 1, with bit complexity poly(2^{p/c} Β· log(dΞ)). Consequently, we reduce the approximation ratio of ββ nearest-neighbor search from prior O(p) to any constant c, while improving space complexity to n^{O(p/c)}.
π Abstract
We introduce average-distortion sketching for metric spaces. As in (worst-case) sketching, these algorithms compress points in a metric space while approximately recovering pairwise distances. The novelty is studying average-distortion: for any fixed (yet, arbitrary) distribution $mu$ over the metric, the sketch should not over-estimate distances, and it should (approximately) preserve the average distance with respect to draws from $mu$. The notion generalizes average-distortion embeddings into $ell_1$ [Rabinovich '03, Kush-Nikolov-Tang '21] as well as data-dependent locality-sensitive hashing [Andoni-Razenshteyn '15, Andoni-Naor-Nikolov-et-al. '18], which have been recently studied in the context of nearest neighbor search. $ullet$ For all $p in (2, infty)$ and any $c$ larger than a fixed constant, we give an average-distortion sketch for $([Delta]^d, ell_p)$ with approximation $c$ and bit-complexity $ ext{poly}(2^{p/c} cdot log(dDelta))$, which is provably impossible in (worst-case) sketching. $ullet$ As an application, we improve on the approximation of sublinear-time data structures for nearest neighbor search over $ell_p$ (for large $p>2$). The prior best approximation was $O(p)$ [Andoni-Naor-Nikolov-et-al. '18, Kush-Nikolov-Tang '21], and we show it can be any $c$ larger than a fixed constant (irrespective of $p$) by using $n^{O(p/c)}$ space. We give some evidence that $2^{Omega(p/c)}$ space may be necessary by giving a lower bound on average-distortion sketches which produce a certain probabilistic certificate of farness (which our sketches crucially rely on).