🤖 AI Summary
This work addresses the loose latent-space structure and limited generalization capability in contrastive self-supervised learning by introducing Sinkhorn entropy-regularized optimal transport constraints into the SimCLR framework. Our method is the first to incorporate the geometry-aware properties of the Wasserstein distance into contrastive learning, explicitly regularizing feature distributions via Sinkhorn iterations to enhance global geometric consistency and inter-class separability. The architecture retains SimCLR’s twin-encoder design and standard image augmentations, adding only a lightweight optimal transport loss. On multiple benchmarks, our approach outperforms SimCLR and matches or exceeds the performance of VICReg and Barlow Twins. UMAP visualizations reveal tighter class clusters and sharper decision boundaries. Furthermore, downstream task transfer accuracy and representation robustness are significantly improved.
📝 Abstract
Self-supervised learning has revolutionized representation learning by eliminating the need for labeled data. Contrastive learning methods, such as SimCLR, maximize the agreement between augmented views of an image but lack explicit regularization to enforce a globally structured latent space. This limitation often leads to suboptimal generalization. We propose SinSim, a novel extension of SimCLR that integrates Sinkhorn regularization from optimal transport theory to enhance representation structure. The Sinkhorn loss, an entropy-regularized Wasserstein distance, encourages a well-dispersed and geometry-aware feature space, preserving discriminative power. Empirical evaluations on various datasets demonstrate that SinSim outperforms SimCLR and achieves competitive performance against prominent self-supervised methods such as VICReg and Barlow Twins. UMAP visualizations further reveal improved class separability and structured feature distributions. These results indicate that integrating optimal transport regularization into contrastive learning provides a principled and effective mechanism for learning robust, well-structured representations. Our findings open new directions for applying transport-based constraints in self-supervised learning frameworks.