🤖 AI Summary
This work addresses the limitation of instance discrimination—a dominant self-supervised learning paradigm—in capturing high-level semantic invariances. To this end, we propose a novel dataset and evaluation framework grounded in *semantic pairs*: positive sample pairs explicitly constructed to reflect meaningful semantic relationships (e.g., fine-grained subclasses within the same category or cross-domain synonymous instances), rather than relying on stochastic data augmentations. Our method integrates such semantically informed pairs into contrastive learning to explicitly guide encoders toward learning invariant representations at the semantic level. Experiments demonstrate substantial improvements in transfer performance on downstream tasks—including ImageNet classification and PASCAL VOC detection—achieving an average +2.3% gain over strong baselines (e.g., SimCLR, MoCo). Furthermore, the publicly released dataset fills a critical gap in semantic-aware self-supervised evaluation, establishing a new benchmark that empirically supports the shift from low-level feature modeling toward semantic understanding in self-supervised representation learning.
📝 Abstract
Instance discrimination is a self-supervised representation learning paradigm wherein individual instances within a dataset are treated as distinct classes. This is typically achieved by generating two disparate views of each instance by applying stochastic transformations, which encourages the model to learn representations that are invariant to the common underlying object across these views.