Improving LSH via Tensorized Random Projection

📅 2024-02-11

🏛️ Acta Informatica

📈 Citations: 1

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the exponential space complexity of conventional locality-sensitive hashing (LSH) in high-order tensor approximate nearest neighbor search, caused by explicit vectorization. We propose a novel LSH framework leveraging CP and tensor train (TT) decompositions—marking the first integration of low-rank tensor decomposition into LSH design. By operating directly on structured tensor representations, our method avoids vectorization entirely, reducing hash function parameter size from exponential to polynomial in tensor order while preserving sensitivity to both Euclidean distance and cosine similarity. We provide theoretical proof that the proposed scheme satisfies the formal LSH definition and offers probabilistic guarantees for approximate nearest neighbor retrieval. Empirical evaluation demonstrates substantial reductions in storage overhead, enables efficient low-rank tensor hashing, and confirms strong scalability. The approach thus bridges theoretical rigor with practical deployability for large-scale tensor similarity search.

Technology Category

Application Category

📝 Abstract

Locality sensitive hashing (LSH) is a fundamental algorithmic toolkit used by data scientists for approximate nearest neighbour search problems that have been used extensively in many large scale data processing applications such as near duplicate detection, nearest neighbour search, clustering, etc. In this work, we aim to propose faster and space efficient locality sensitive hash functions for Euclidean distance and cosine similarity for tensor data. Typically, the naive approach for obtaining LSH for tensor data involves first reshaping the tensor into vectors, followed by applying existing LSH methods for vector data $E2LSH$ and $SRP$. However, this approach becomes impractical for higher order tensors because the size of the reshaped vector becomes exponential in the order of the tensor. Consequently, the size of LSH parameters increases exponentially. To address this problem, we suggest two methods for LSH for Euclidean distance and cosine similarity, namely $CP-E2LSH$, $TT-E2LSH$, and $CP-SRP$, $TT-SRP$, respectively, building on $CP$ and tensor train $(TT)$ decompositions techniques. Our approaches are space efficient and can be efficiently applied to low rank $CP$ or $TT$ tensors. We provide a rigorous theoretical analysis of our proposal on their correctness and efficacy.

Problem

Research questions and friction points this paper is trying to address.

Improving LSH for tensor data efficiency.

Addressing exponential parameter growth in LSH.

Proposing space-efficient LSH for Euclidean and cosine similarity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tensorized LSH for Euclidean distance and cosine similarity

Utilizes CP and TT decomposition techniques

Space-efficient for low-rank CP or TT tensors

🔎 Similar Papers

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE