๐ค AI Summary
This work reveals that contrastive learning (CL) fundamentally amounts to distribution alignment in the latent space. Grounded in entropy-regularized optimal transport theory, we establish, for the first time, a rigorous theoretical connection between CL and distribution alignment, unifying mainstream noise contrastive estimation (NCE)-based losses under a common framework. Building on this insight, we propose a customizable family of multi-step distribution alignment losses, enabling robust modeling under imbalanced views and explicit structural constraints on the representation space. Our method achieves both theoretical consistency and practical flexibility: it can be seamlessly integrated into existing CL pipelines while allowing explicit control over alignment granularity and geometric structure. Extensive experiments across multiple benchmark tasks demonstrate that the proposed losses significantly improve representation quality, generalization performance, and robustness to data noiseโoffering an interpretable, scalable, and principled new paradigm for contrastive learning.
๐ Abstract
Despite the success of contrastive learning (CL) in vision and language, its theoretical foundations and mechanisms for building representations remain poorly understood. In this work, we build connections between noise contrastive estimation losses widely used in CL and distribution alignment with entropic optimal transport (OT). This connection allows us to develop a family of different losses and multistep iterative variants for existing CL methods. Intuitively, by using more information from the distribution of latents, our approach allows a more distribution-aware manipulation of the relationships within augmented sample sets. We provide theoretical insights and experimental evidence demonstrating the benefits of our approach for {em generalized contrastive alignment}. Through this framework, it is possible to leverage tools in OT to build unbalanced losses to handle noisy views and customize the representation space by changing the constraints on alignment. By reframing contrastive learning as an alignment problem and leveraging existing optimization tools for OT, our work provides new insights and connections between different self-supervised learning models in addition to new tools that can be more easily adapted to incorporate domain knowledge into learning.