🤖 AI Summary
This work addresses the degradation in generalization performance of contrastive learning when training and test domains exhibit distributional shifts—particularly under unseen domains—by proposing a domain-aware adaptive temperature control mechanism. The method uniquely incorporates domain labels into the InfoNCE loss, dynamically adjusting the temperature parameter based on whether negative samples share the same domain as the anchor. This domain-conditioned reweighting of negative samples enhances the domain invariance of learned representations. Evaluated within a multi-domain self-supervised pretraining framework, the approach significantly outperforms existing domain generalization baselines on MNIST variant datasets, demonstrating not only improved generalization to unseen domains but also consistent performance gains on in-distribution tasks.
📝 Abstract
Self-supervised pre-training with contrastive learning is a powerful method for learning from sparsely labeled data. However, performance can drop considerably when there is a shift in the distribution of data from training to test time. We study this phenomenon in a setting in which the training data come from multiple domains, and the test data come from a domain not seen at training that is subject to significant covariate shift. We present a new method for contrastive learning that incorporates domain labels to increase the domain invariance of learned representations, leading to improved out-of-distribution generalization. Our method adjusts the temperature parameter in the InfoNCE loss -- which controls the relative weighting of negative pairs -- using the probability that a negative sample comes from the same domain as the anchor. This upweights pairs from more similar domains, encouraging the model to discriminate samples based on domain-invariant attributes. Through experiments on a variant of the MNIST dataset, we demonstrate that our method yields better out-of-distribution performance than domain generalization baselines. Furthermore, our method maintains strong in-distribution task performance, substantially outperforming baselines on this measure.