On the Alignment Between Supervised and Self-Supervised Contrastive Learning

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work investigates representational alignment—beyond loss-function equivalence—between self-supervised Contrastive Learning (CL) and negatives-only supervised Contrastive Learning (NSCL). Method: Leveraging theoretical analysis and empirical evaluation via Centered Kernel Alignment (CKA) and Representational Similarity Analysis (RSA), we compare representation dynamics across training trajectories under shared random initialization. Contribution/Results: We establish, for the first time, that CL and NSCL exhibit highly consistent representation evolution despite substantial divergence in parameter space. This alignment strengthens with larger model size and higher temperature, while being modulated by batch size and number of classes. Crucially, NSCL emerges as a pivotal bridge unifying self-supervised and supervised learning: its representation dynamics align more closely with CL than do conventional supervised objectives (e.g., cross-entropy). These findings offer a novel perspective on the unifying principles underlying contrastive learning paradigms.

Technology Category

Application Category

📝 Abstract

Self-supervised contrastive learning (CL) has achieved remarkable empirical success, often producing representations that rival supervised pre-training on downstream tasks. Recent theory explains this by showing that the CL loss closely approximates a supervised surrogate, Negatives-Only Supervised Contrastive Learning (NSCL) loss, as the number of classes grows. Yet this loss-level similarity leaves an open question: {em Do CL and NSCL also remain aligned at the representation level throughout training, not just in their objectives?} We address this by analyzing the representation alignment of CL and NSCL models trained under shared randomness (same initialization, batches, and augmentations). First, we show that their induced representations remain similar: specifically, we prove that the similarity matrices of CL and NSCL stay close under realistic conditions. Our bounds provide high-probability guarantees on alignment metrics such as centered kernel alignment (CKA) and representational similarity analysis (RSA), and they clarify how alignment improves with more classes, higher temperatures, and its dependence on batch size. In contrast, we demonstrate that parameter-space coupling is inherently unstable: divergence between CL and NSCL weights can grow exponentially with training time. Finally, we validate these predictions empirically, showing that CL-NSCL alignment strengthens with scale and temperature, and that NSCL tracks CL more closely than other supervised objectives. This positions NSCL as a principled bridge between self-supervised and supervised learning. Our code and project page are available at [href{https://github.com/DLFundamentals/understanding_ssl_v2}{code}, href{https://dlfundamentals.github.io/cl-nscl-representation-alignment/}{project page}].

Problem

Research questions and friction points this paper is trying to address.

Analyzing representation alignment between self-supervised and supervised contrastive learning

Proving similarity matrices remain close under shared training conditions

Investigating how alignment improves with scale, temperature and batch size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing representation alignment under shared randomness

Proving similarity matrices remain close under realistic conditions

Validating alignment strengthens with scale and temperature

🔎 Similar Papers

A Review on Discriminative Self-supervised Learning Methods in Computer Vision