Enhancing Self-Supervised Learning with Semantic Pairs A New Dataset and Empirical Study

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of instance discrimination—a dominant self-supervised learning paradigm—in capturing high-level semantic invariances. To this end, we propose a novel dataset and evaluation framework grounded in *semantic pairs*: positive sample pairs explicitly constructed to reflect meaningful semantic relationships (e.g., fine-grained subclasses within the same category or cross-domain synonymous instances), rather than relying on stochastic data augmentations. Our method integrates such semantically informed pairs into contrastive learning to explicitly guide encoders toward learning invariant representations at the semantic level. Experiments demonstrate substantial improvements in transfer performance on downstream tasks—including ImageNet classification and PASCAL VOC detection—achieving an average +2.3% gain over strong baselines (e.g., SimCLR, MoCo). Furthermore, the publicly released dataset fills a critical gap in semantic-aware self-supervised evaluation, establishing a new benchmark that empirically supports the shift from low-level feature modeling toward semantic understanding in self-supervised representation learning.

Technology Category

Application Category

📝 Abstract
Instance discrimination is a self-supervised representation learning paradigm wherein individual instances within a dataset are treated as distinct classes. This is typically achieved by generating two disparate views of each instance by applying stochastic transformations, which encourages the model to learn representations that are invariant to the common underlying object across these views.
Problem

Research questions and friction points this paper is trying to address.

Proposes semantic pairs to enhance self-supervised learning methods
Introduces a new dataset for empirical study of representation learning
Improves instance discrimination by learning invariant representations across views
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces semantic pairs dataset for self-supervised learning
Uses instance discrimination with stochastic view generation
Learns invariant representations across augmented instance views
🔎 Similar Papers
No similar papers found.
M
Mohammad Alkhalefi
Department of Computing Science, University of Aberdeen
G
Georgios Leontidis
Department of Computing Science & Interdisciplinary Institute, University of Aberdeen
Mingjun Zhong
Mingjun Zhong
Department of Computing Science, University of Aberdeen, UK
Applied StatisticsMachine Learning