Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the inefficiency and high computational cost of hard negative mining for large-scale, high-dimensional data in contrastive learning, this paper proposes a GPU-friendly, theoretically analyzable binary Locality-Sensitive Hashing (LSH) method. It is the first to employ an LSH scheme with rigorous error-bound guarantees specifically for hard negative sampling. Our approach preserves structural similarity via structure-aware quantization from real-valued features to compact binary codes, enabling efficient approximate nearest neighbor search while being natively optimized for GPU parallelism. Evaluated across multiple text and vision benchmarks, our method achieves retrieval quality on par with or surpassing state-of-the-art approaches, reduces average inference latency by over 40%, and significantly lowers memory footprint—thereby jointly advancing accuracy, efficiency, and deployment practicality.

Technology Category

Application Category

📝 Abstract

Contrastive learning is a representational learning paradigm in which a neural network maps data elements to feature vectors. It improves the feature space by forming lots with an anchor and examples that are either positive or negative based on class similarity. Hard negative examples, which are close to the anchor in the feature space but from a different class, improve learning performance. Finding such examples of high quality efficiently in large, high-dimensional datasets is computationally challenging. In this paper, we propose a GPU-friendly Locality-Sensitive Hashing (LSH) scheme that quantizes real-valued feature vectors into binary representations for approximate nearest neighbor search. We investigate its theoretical properties and evaluate it on several datasets from textual and visual domain. Our approach achieves comparable or better performance while requiring significantly less computation than existing hard negative mining strategies.

Problem

Research questions and friction points this paper is trying to address.

Efficiently finding hard negative examples in large datasets

Improving contrastive learning with high-quality negative samples

Reducing computation for nearest neighbor search in high dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-friendly Locality-Sensitive Hashing scheme

Quantizes feature vectors into binary representations

Efficient approximate nearest neighbor search

🔎 Similar Papers

No similar papers found.