Learning-Based Hashing for ANN Search: Foundations and Early Advances

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Approximate nearest neighbor (ANN) search remains a core challenge in large-scale cross-modal retrieval. This paper systematically surveys the early development of learning-based hashing methods, focusing on data-driven optimization of projection functions and quantization strategies to map high-dimensional features into compact binary codes enabling efficient similarity computation in Hamming space. Distinguishing itself from random hashing, the survey categorizes approaches into supervised, unsupervised, and semi-supervised paradigms, covering key directions including multi-bit encoding, adaptive thresholding, and cross-modal extensions. It distills their theoretical foundations and design principles, elucidating the fundamental trade-offs among accuracy, efficiency, and generalizability. Furthermore, it establishes a structured conceptual framework that clarifies the applicability boundaries and open challenges of early models. By doing so, the work provides both a theoretical reference and an evolutionary roadmap for future research on interpretable, robust, and multimodal hashing.

Technology Category

Application Category

📝 Abstract

Approximate Nearest Neighbour (ANN) search is a fundamental problem in information retrieval, underpinning large-scale applications in computer vision, natural language processing, and cross-modal search. Hashing-based methods provide an efficient solution by mapping high-dimensional data into compact binary codes that enable fast similarity computations in Hamming space. Over the past two decades, a substantial body of work has explored learning to hash, where projection and quantisation functions are optimised from data rather than chosen at random. This article offers a foundational survey of early learning-based hashing methods, with an emphasis on the core ideas that shaped the field. We review supervised, unsupervised, and semi-supervised approaches, highlighting how projection functions are designed to generate meaningful embeddings and how quantisation strategies convert these embeddings into binary codes. We also examine extensions to multi-bit and multi-threshold models, as well as early advances in cross-modal retrieval. Rather than providing an exhaustive account of the most recent methods, our goal is to introduce the conceptual foundations of learning-based hashing for ANN search. By situating these early models in their historical context, we aim to equip readers with a structured understanding of the principles, trade-offs, and open challenges that continue to inform current research in this area.

Problem

Research questions and friction points this paper is trying to address.

Surveying early learning-based hashing methods for ANN search

Exploring projection and quantization functions optimized from data

Introducing conceptual foundations and historical context of hashing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning-based hashing optimizes projection and quantization functions

Methods map high-dimensional data into compact binary codes

Supervised and unsupervised approaches generate meaningful binary embeddings

🔎 Similar Papers

No similar papers found.