Hierarchical Locality Sensitive Hashing for Structured Data: A Survey

📅 2022-04-24
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Traditional locality-sensitive hashing (LSH) struggles to preserve topological and hierarchical relationships among elements in structured data—such as sequences, trees, and graphs—leading to inaccurate similarity estimation. To address this, this paper presents a systematic survey of hierarchical LSH (HLH) for structured data. We unify its development trajectory along three dimensions: data structures, application scenarios, and open challenges—offering the first such comprehensive characterization. Methodologically, we identify four core technical paradigms: (i) multi-granularity encoding, (ii) hierarchical similarity propagation, (iii) structure-aware signature generation, and (iv) recursive hashing via graph/tree decomposition. Based on these, we establish a taxonomy encompassing over ten state-of-the-art HLH algorithms, precisely delineating their applicability boundaries. Our work provides both a theoretical framework and practical guidelines for efficient approximate similarity search over structured data.
📝 Abstract
Data similarity (or distance) computation is a fundamental research topic which fosters a variety of similarity-based machine learning and data mining applications. In big data analytics, it is impractical to compute the exact similarity of data instances due to high computational cost. To this end, the Locality Sensitive Hashing (LSH) technique has been proposed to provide accurate estimators for various similarity measures between sets or vectors in an efficient manner without the learning process. Structured data (e.g., sequences, trees and graphs), which are composed of elements and relations between the elements, are commonly seen in the real world, but the traditional LSH algorithms cannot preserve the structure information represented as relations between elements. In order to conquer the issue, researchers have been devoted to the family of the hierarchical LSH algorithms. In this paper, we explore the present progress of the research into hierarchical LSH from the following perspectives: 1) Data structures, where we review various hierarchical LSH algorithms for three typical data structures and uncover their inherent connections; 2) Applications, where we review the hierarchical LSH algorithms in multiple application scenarios; 3) Challenges, where we discuss some potential challenges as future directions.
Problem

Research questions and friction points this paper is trying to address.

Efficient similarity computation for structured data
Preserving structural information in data similarity
Reviewing hierarchical LSH algorithms and applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical LSH preserves structured data relations.
Efficient similarity estimation without learning process.
Focus on sequences, trees, and graphs applications.
🔎 Similar Papers
No similar papers found.
W
Wei Wu
School of Computer Science and Engineering, Central South University, Changsha 410083, China
B
Bin Li
School of Computer Science, Fudan University, Shanghai 200433, China