🤖 AI Summary
In contrastive learning, nearest-neighbor positive samples are often “easy” pairs—already highly similar in embedding space—leading to insufficient positive diversity and limited representation discriminability. To address this, we propose DiffCLR: the first framework to introduce synthetically generated images from unconditional diffusion models as semantically consistent yet background-diverse *hard positives* into contrastive learning. We enable controllable, semantics-preserving image synthesis via latent-space feature interpolation and extend the contrastive loss to jointly optimize over both real and synthetic positives. Linear evaluation on CIFAR-10 surpasses NNCLR and All4One by >2% and 1%, respectively; in transfer learning across eight downstream tasks, DiffCLR outperforms competitors on six and achieves state-of-the-art performance. Our core contribution is a diffusion-based paradigm for generating hard positives, significantly enhancing the discriminability and generalization capability of contrastive representations.
📝 Abstract
Contrastive learning with the nearest neighbor has proved to be one of the most efficient self-supervised learning (SSL) techniques by utilizing the similarity of multiple instances within the same class. However, its efficacy is constrained as the nearest neighbor algorithm primarily identifies"easy"positive pairs, where the representations are already closely located in the embedding space. In this paper, we introduce a novel approach called Contrastive Learning with Synthetic Positives (CLSP) that utilizes synthetic images, generated by an unconditional diffusion model, as the additional positives to help the model learn from diverse positives. Through feature interpolation in the diffusion model sampling process, we generate images with distinct backgrounds yet similar semantic content to the anchor image. These images are considered"hard"positives for the anchor image, and when included as supplementary positives in the contrastive loss, they contribute to a performance improvement of over 2% and 1% in linear evaluation compared to the previous NNCLR and All4One methods across multiple benchmark datasets such as CIFAR10, achieving state-of-the-art methods. On transfer learning benchmarks, CLSP outperforms existing SSL frameworks on 6 out of 8 downstream datasets. We believe CLSP establishes a valuable baseline for future SSL studies incorporating synthetic data in the training process.