Global Minimizers of Sigmoid Contrastive Loss

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the intrinsic mechanisms underlying cross-modal representation alignment under the Sigmoid contrastive loss. Motivated by SigLIP’s strong retrieval performance despite unclear origins of the modality gap, we introduce a novel geometric construct—the (m, b_rel)-Constellation—which formally characterizes, for the first time, the synergistic role of trainable inverse temperature and relative bias in driving loss convergence to zero. Leveraging spherical code theory, we rigorously derive the minimal embedding dimension required for high-quality representations and reveal that the modality gap stems from non-uniform distribution of features on the unit hypersphere. We further propose a loss reparameterization incorporating an explicit relative bias term. Experiments demonstrate that this formulation significantly improves training dynamics on synthetic data, accelerating convergence and enhancing representation quality.

Technology Category

Application Category

📝 Abstract
The meta-task of obtaining and aligning representations through contrastive pretraining is steadily gaining importance since its introduction in CLIP and ALIGN. In this paper we theoretically explain the advantages of synchronizing with trainable inverse temperature and bias under the sigmoid loss, as implemented in the recent SigLIP and SigLIP2 models of Google DeepMind. Temperature and bias can drive the loss function to zero for a rich class of configurations that we call $(mathsf{m}, mathsf{b}_{mathsf{rel}})$-Constellations. $(mathsf{m}, mathsf{b}_{mathsf{rel}})$-Constellations are a novel combinatorial object related to spherical codes and are parametrized by a margin $mathsf{m}$ and relative bias $mathsf{b}_{mathsf{rel}}$. We use our characterization of constellations to theoretically justify the success of SigLIP on retrieval, to explain the modality gap present in SigLIP, and to identify the necessary dimension for producing high-quality representations. Finally, we propose a reparameterization of the sigmoid loss with explicit relative bias, which improves training dynamics in experiments with synthetic data.
Problem

Research questions and friction points this paper is trying to address.

Theoretical analysis of sigmoid contrastive loss with trainable temperature and bias
Characterizing optimal configurations called (m, b_rel)-Constellations for representation learning
Explaining SigLIP model success on retrieval tasks and modality gap phenomena
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synchronizing temperature and bias under sigmoid loss
Introducing novel combinatorial objects called Constellations
Reparameterizing sigmoid loss with explicit relative bias
🔎 Similar Papers
No similar papers found.
K
Kiril Bangachev
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139
Guy Bresler
Guy Bresler
Massachusetts Institute of Technology
Theoretical computer scienceInformation theoryStatisticsProbability
I
Iliyas Noman
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139
Y
Yury Polyanskiy
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139