Dynamic Query Modification for Binary Locality Sensitive Hashing

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

244K/year
🤖 AI Summary
This work addresses the limitations of binary locality-sensitive hashing (LSH) in approximate nearest neighbor (ANN) search, where recall and efficiency are often suboptimal. To overcome this, the authors propose a dynamic query modification mechanism that adaptively transforms the original query into a new center point at query time, significantly increasing both the probability and stability of hash collisions with true neighbors. Building upon this mechanism, they design MQ-Forest, an ANN retrieval framework that integrates random projection techniques for enhanced efficiency. Extensive experiments demonstrate that MQ-Forest reduces indexing and query time by up to 40% compared to baseline methods across multiple large-scale, high-dimensional datasets. Notably, this is the first approach to incorporate dynamic query transformation into binary LSH, effectively balancing accuracy and computational efficiency.
📝 Abstract
Our context of interest is how binary locality sensitive hash (LSH) functions can be used to solve the approximate near neighbour (ANN) problem, which seeks to find the k closest elements of some dataset X to some further point q presented as a query. Binary locality sensitive function families H are sets of functions each which accept a point and return a binary value. A function is locality sensitive if and only if the output of the function is more likely to be equal (a 'hash collision') if two close vectors are used as input than if two far vectors are used. A data structure can be built by generating binary hash codes for each member of X, which are generated by drawing and applying one or more functions from H. When q is presented as a query, the same set of functions is applied to it and those elements of X with equal binary hash codes are retrieved. In this paper we introduce dynamic query modification. This process changes q at query time to form a new value c, which by theoretical and experimental analysis we prove has two significant advantages. Firstly, the hash output of c collides with near neighbours with a greater probability than q. Secondly, we show there is little chance of c failing to collide with any near neighbours; a property which we demonstrate is not true for q. To demonstrate the efficacy of the technique, we define a novel structure MQ-Forest, a modified version of RP-Forest. Both are binary LSH-based ANN mechanisms, but MQ-Forest dynamically estimates a value for c during the query process. We show that MQ-Forest reduces both build and query times by up to 40% when measured over several large, high-dimensional benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

Approximate Near Neighbour
Binary Locality Sensitive Hashing
Hash Collision
High-dimensional Data
Query Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Query Modification
Binary Locality Sensitive Hashing
Approximate Nearest Neighbor
MQ-Forest
Hash Collision Probability
🔎 Similar Papers