🤖 AI Summary
This work addresses the limited generalization of InfoNCE loss in self-supervised contrastive learning, stemming from Monte Carlo integration error when modeling continuous domains. We propose a non-parametric conditional density estimation method based on Multiple Importance Sampling (MIS). By approximating the true conditional density via convex optimization, we derive a theoretically consistent contrastive objective, offering the first probabilistic interpretation of InfoNCE’s inherent bias. Our approach avoids explicit density parameterization, thereby circumventing biases introduced by conventional discretization or restrictive distributional assumptions. Empirically, it achieves significant improvements over state-of-the-art baselines—including SimCLR and CLIP—on large-scale image–text pretraining benchmarks CC3M and CC12M. The implementation is publicly available.
📝 Abstract
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm. Our code is available at https://github.com/bokun-wang/NUCLR.