On the LSH Distortion of Ulam and Cayley Similarities

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

258K/year
🤖 AI Summary
This work investigates the approximability of Ulam and Cayley similarities within the locality-sensitive hashing (LSH) framework, quantifying their multiplicative distortion relative to similarity functions that admit exact LSH constructions. By integrating probabilistic analysis, combinatorics, and LSH theory, the study establishes the first sublinear upper bound of $O(n/\sqrt{\log n})$ and a lower bound of $\Omega(n^{0.12})$ on the LSH distortion for Ulam similarity. In contrast, it proves that the LSH distortion for Cayley similarity is tightly $\Theta(n)$. These results demonstrate that Ulam similarity admits efficient approximate nearest neighbor search with sublinear distortion, whereas Cayley similarity is fundamentally incompatible with LSH-based acceleration, thereby providing crucial theoretical foundations for permutation-based similarity search.
📝 Abstract
Locality-sensitive hashing (LSH) has found widespread use as a fundamental primitive, particularly to accelerate nearest neighbor search. An LSH scheme for a similarity function $S:\mathcal{X} \times \mathcal{X} \to [0,1]$ is a distribution over hash functions on $\mathcal{X}$ with the property that the probability of collision of any two elements $x,y\in \mathcal{X}$ is exactly equal to $S(x,y)$. However, not all similarity functions admit exact LSH schemes. The notion of LSH distortion measures how multiplicatively close a similarity function is to having an LSH scheme. In this work, we study the LSH distortion of the Ulam and Cayley similarities, which are popular similarity measures on permutations of $n$ elements. We show that the Ulam similarity admits a sublinear LSH distortion of $O(n / \sqrt{\log n})$; we also prove a lower bound of $Ω(n^{0.12})$ on the best LSH distortion achievable. On the other hand, we show that the LSH distortion of the Cayley similarity is $Θ(n)$.
Problem

Research questions and friction points this paper is trying to address.

LSH distortion
Ulam similarity
Cayley similarity
permutations
locality-sensitive hashing
Innovation

Methods, ideas, or system contributions that make the work stand out.

LSH distortion
Ulam similarity
Cayley similarity
permutations
locality-sensitive hashing