🤖 AI Summary
This work investigates the approximability of Ulam and Cayley similarities within the locality-sensitive hashing (LSH) framework, quantifying their multiplicative distortion relative to similarity functions that admit exact LSH constructions. By integrating probabilistic analysis, combinatorics, and LSH theory, the study establishes the first sublinear upper bound of $O(n/\sqrt{\log n})$ and a lower bound of $\Omega(n^{0.12})$ on the LSH distortion for Ulam similarity. In contrast, it proves that the LSH distortion for Cayley similarity is tightly $\Theta(n)$. These results demonstrate that Ulam similarity admits efficient approximate nearest neighbor search with sublinear distortion, whereas Cayley similarity is fundamentally incompatible with LSH-based acceleration, thereby providing crucial theoretical foundations for permutation-based similarity search.
📝 Abstract
Locality-sensitive hashing (LSH) has found widespread use as a fundamental primitive, particularly to accelerate nearest neighbor search. An LSH scheme for a similarity function $S:\mathcal{X} \times \mathcal{X} \to [0,1]$ is a distribution over hash functions on $\mathcal{X}$ with the property that the probability of collision of any two elements $x,y\in \mathcal{X}$ is exactly equal to $S(x,y)$. However, not all similarity functions admit exact LSH schemes. The notion of LSH distortion measures how multiplicatively close a similarity function is to having an LSH scheme.
In this work, we study the LSH distortion of the Ulam and Cayley similarities, which are popular similarity measures on permutations of $n$ elements. We show that the Ulam similarity admits a sublinear LSH distortion of $O(n / \sqrt{\log n})$; we also prove a lower bound of $Ω(n^{0.12})$ on the best LSH distortion achievable. On the other hand, we show that the LSH distortion of the Cayley similarity is $Θ(n)$.