🤖 AI Summary
Approximate nearest neighbor (ANN) search remains a core challenge in large-scale cross-modal retrieval. This paper systematically surveys the early development of learning-based hashing methods, focusing on data-driven optimization of projection functions and quantization strategies to map high-dimensional features into compact binary codes enabling efficient similarity computation in Hamming space. Distinguishing itself from random hashing, the survey categorizes approaches into supervised, unsupervised, and semi-supervised paradigms, covering key directions including multi-bit encoding, adaptive thresholding, and cross-modal extensions. It distills their theoretical foundations and design principles, elucidating the fundamental trade-offs among accuracy, efficiency, and generalizability. Furthermore, it establishes a structured conceptual framework that clarifies the applicability boundaries and open challenges of early models. By doing so, the work provides both a theoretical reference and an evolutionary roadmap for future research on interpretable, robust, and multimodal hashing.
📝 Abstract
Approximate Nearest Neighbour (ANN) search is a fundamental problem in information retrieval, underpinning large-scale applications in computer vision, natural language processing, and cross-modal search. Hashing-based methods provide an efficient solution by mapping high-dimensional data into compact binary codes that enable fast similarity computations in Hamming space. Over the past two decades, a substantial body of work has explored learning to hash, where projection and quantisation functions are optimised from data rather than chosen at random.
This article offers a foundational survey of early learning-based hashing methods, with an emphasis on the core ideas that shaped the field. We review supervised, unsupervised, and semi-supervised approaches, highlighting how projection functions are designed to generate meaningful embeddings and how quantisation strategies convert these embeddings into binary codes. We also examine extensions to multi-bit and multi-threshold models, as well as early advances in cross-modal retrieval.
Rather than providing an exhaustive account of the most recent methods, our goal is to introduce the conceptual foundations of learning-based hashing for ANN search. By situating these early models in their historical context, we aim to equip readers with a structured understanding of the principles, trade-offs, and open challenges that continue to inform current research in this area.