Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In self-supervised contrastive learning, global random sampling often misclassifies semantically similar samples as negative pairs (false negatives), leading to erroneous separation in the embedding space. To address this, we propose the first online dynamic false-negative identification mechanism operating over the entire dataset, overcoming the limitations of conventional intra-batch local constraints. Our approach enables real-time, global false-negative discovery with computational overhead independent of dataset size. Methodologically, it integrates embedding-space similarity modeling, optimization-based adaptive threshold learning, and online gradient updates. Extensive experiments demonstrate substantial improvements in representation quality across image and vision-language multimodal tasks: ResNet-50 achieves a +1.8% top-1 accuracy gain under linear evaluation on ImageNet-1K. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as"false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND .
Problem

Research questions and friction points this paper is trying to address.

Identifies false negatives in self-supervised contrastive learning.
Globally detects false negatives across the entire dataset.
Reduces computation cost independent of dataset size.
Innovation

Methods, ideas, or system contributions that make the work stand out.

GloFND detects false negatives globally.
Threshold learning per anchor on the fly.
Computational cost independent of dataset size.
🔎 Similar Papers
No similar papers found.
V
Vicente Balmaseda
Department of Computer Science and Engineering, Texas A&M University, Texas, USA
Bokun Wang
Bokun Wang
Texas A&M University
Machine LearningArtificial IntelligenceMultimodal Machine Learning
C
Ching-Long Lin
Department of Mechanical Engineering, University of Iowa, Iowa, USA
Tianbao Yang
Tianbao Yang
Texas A&M University
machine learningstochastic optimization