Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods

📅 2023-06-28
🏛️ Trans. Mach. Learn. Res.
📈 Citations: 2
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
In instance-discriminative self-supervised learning, conventional data augmentation may erroneously repel semantically similar samples and discard class-discriminative features. To address this, we propose a semantic-augmented contrastive learning framework that explicitly incorporates semantically similar images as additional positive pairs—a novel design first introduced in this work. Our method dynamically mines semantic positives based on feature similarity and seamlessly integrates with mainstream architectures such as MoCo and SimSiam. On ImageNet, it achieves a +4.1% improvement in linear evaluation accuracy (800 epochs) over MoCo-v2. Significant gains are also observed on STL-10, CIFAR-10, and downstream object detection tasks. Our core contributions are threefold: (1) the first explicit modeling of semantic similarity as a prior for positive pair construction; (2) enhanced semantic richness and intra-class discriminability of learned representations; and (3) strong generality and extensibility across architectures and downstream tasks.
📝 Abstract
Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results, performing competitively or even outperforming supervised learning counterparts in some downstream tasks. Such approaches employ data augmentation to create two views of the same instance (i.e., positive pairs) and encourage the model to learn good representations by attracting these views closer in the embedding space without collapsing to the trivial solution. However, data augmentation is limited in representing positive pairs, and the repulsion process between the instances during contrastive learning may discard important features for instances that have similar categories. To address this issue, we propose an approach to identify those images with similar semantic content and treat them as positive instances, thereby reducing the chance of discarding important features during representation learning and increasing the richness of the latent representation. Our approach is generic and could work with any self-supervised instance discrimination frameworks such as MoCo and SimSiam. To evaluate our method, we run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches. The experimental results show that our approach consistently outperforms the baseline methods across all three datasets; for instance, we improve upon the vanilla MoCo-v2 by 4.1% on ImageNet under a linear evaluation protocol over 800 epochs. We also report results on semi-supervised learning, transfer learning on downstream tasks, and object detection.
Problem

Research questions and friction points this paper is trying to address.

Enhance visual representation learning in self-supervised instance discrimination methods
Address limitations of data augmentation in creating semantic positive pairs
Prevent discarding important features for instances with similar categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses semantic positive pairs for representation learning
Integrates with MoCo and SimSiam frameworks
Improves performance on ImageNet and other datasets