Faster and Space Efficient Indexing for Locality Sensitive Hashing

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address the high computational complexity (O(md)) and large memory overhead of traditional locality-sensitive hashing (LSH) schemes—such as ELSH and SRP—in large-scale, high-dimensional approximate nearest neighbor search, this paper pioneers the integration of Count Sketch and its higher-order variants into LSH hash construction, proposing two novel LSH algorithms. Theoretically, our methods reduce hash computation complexity to O(d) and achieve space complexities of O(d) and O(N·d^{1/N}), respectively, while providing rigorous error bounds. Extensive experiments on multiple real-world datasets demonstrate that the proposed algorithms significantly accelerate hash construction, drastically reduce memory consumption, and maintain retrieval accuracy comparable to classical LSH baselines.

Technology Category

Application Category

📝 Abstract

This work suggests faster and space-efficient index construction algorithms for LSH for Euclidean distance ( extit{a.k.a.}~ELSH) and cosine similarity ( extit{a.k.a.}~SRP). The index construction step of these LSHs relies on grouping data points into several bins of hash tables based on their hashcode. To generate an $m$-dimensional hashcode of the $d$-dimensional data point, these LSHs first project the data point onto a $d$-dimensional random Gaussian vector and then discretise the resulting inner product. The time and space complexity of both ELSH~and SRP~for computing an $m$-sized hashcode of a $d$-dimensional vector is $O(md)$, which becomes impractical for large values of $m$ and $d$. To overcome this problem, we propose two alternative LSH hashcode generation algorithms both for Euclidean distance and cosine similarity, namely, CSELSH, HCSELSH~and CSSRP, HCSSRP, respectively. CSELSH~and CSSRP~are based on count sketch cite{count_sketch} and HCSELSH~and HCSSRP~utilize higher-order count sketch cite{shi2019higher}. These proposals significantly reduce the hashcode computation time from $O(md)$ to $O(d)$. Additionally, both CSELSH~and CSSRP~reduce the space complexity from $O(md)$ to $O(d)$; ~and HCSELSH, HCSSRP~ reduce the space complexity from $O(md)$ to $O(N sqrt[N]{d})$ respectively, where $Ngeq 1$ denotes the size of the input/reshaped tensor. Our proposals are backed by strong mathematical guarantees, and we validate their performance through simulations on various real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Improves time and space efficiency in LSH index construction.

Reduces hashcode computation complexity from O(md) to O(d).

Introduces new algorithms for Euclidean distance and cosine similarity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes faster LSH index construction algorithms.

Reduces hashcode computation time to O(d).

Decreases space complexity significantly for LSH.

🔎 Similar Papers

Hierarchical Locality Sensitive Hashing for Structured Data: A Survey