On the Similarities of Embeddings in Contrastive Learning

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Contrastive learning (CL) lacks a unified theoretical framework to characterize the distribution of embedding similarities between positive and negative pairs—particularly under small-batch settings, where high variance in negative-pair cosine similarities causes performance degradation. Method: We propose the first similarity-distribution-based unified theoretical framework for CL, revealing two fundamental insights: (i) inherent alignment limitations for positive pairs even under full-batch training; and (ii) high variance in negative-pair similarities as the critical bottleneck in small-batch regimes. Building on this, we design a variance-suppression auxiliary loss, theoretically grounded via similarity-threshold analysis and alignment proofs, and instantiated through cosine-similarity modeling. Results: Our loss consistently improves representation quality across multiple vision benchmarks. When integrated with SimCLR and MoCo under small batches (≤256), it yields an average +1.8% linear evaluation accuracy gain.

Technology Category

Application Category

📝 Abstract

Contrastive learning (CL) operates on a simple yet effective principle: embeddings of positive pairs are pulled together, while those of negative pairs are pushed apart. Although various forms of contrastive loss have been proposed and analyzed from different perspectives, prior works lack a comprehensive framework that systematically explains a broad class of these objectives. In this paper, we present a unified framework for understanding CL, which is based on analyzing the cosine similarity between embeddings of positive and negative pairs. In full-batch settings, we show that perfect alignment of positive pairs is unattainable when similarities of negative pairs fall below a certain threshold, and that this misalignment can be alleviated by incorporating within-view negative pairs. In mini-batch settings, we demonstrate that smaller batch sizes incur stronger separation among negative pairs within batches, which leads to higher variance in similarities of negative pairs. To address this limitation of mini-batch CL, we introduce an auxiliary loss term that reduces the variance of similarities of negative pairs in CL. Empirical results demonstrate that incorporating the proposed loss consistently improves the performance of CL methods in small-batch training.

Problem

Research questions and friction points this paper is trying to address.

Lack of unified framework for contrastive learning objectives

Misalignment in embeddings due to negative pair similarity thresholds

High variance in negative pair similarities with small batches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework analyzing cosine similarity in CL

Auxiliary loss reduces variance in negative pairs

Within-view negative pairs alleviate misalignment issues

🔎 Similar Papers

No similar papers found.