Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work challenges the common practice in contrastive learning of using cosine similarity, which implicitly assumes that embedding norms are noise and discards their potential semantic information. Through a systematic 2×2 ablation study that independently controls input and output normalization in both text and vision encoders, the authors investigate the functional role of embedding norms. They propose a task symmetry principle: preserving norm information significantly improves performance in asymmetric tasks such as text retrieval, but harms performance in symmetric tasks. Furthermore, they reveal an asymmetric functional distinction between input and output norms. By combining controlled normalization, ablation experiments, and Cohen’s d effect size analysis, the study demonstrates that merely removing redundant unit hypersphere constraints at inference yields zero-cost performance gains on dense text retrieval benchmarks.

Technology Category

Application Category

📝 Abstract

Cosine similarity is prevalent in contrastive learning, yet it makes an implicit assumption: embedding magnitude is noise. Prior work occasionally found dot product and cosine similarity comparable, but left unanswered WHAT information magnitude carries, WHEN it helps, and HOW to leverage it. We conduct a systematic study through a $2 \times 2$ ablation that independently controls input-side and output-side normalization across text and vision models. Our findings reveal three key insights. First, in text retrieval, output (document) magnitude strongly correlates with relevance (Cohen's $d$ up to 1.80), yielding the largest gains on reasoning-intensive tasks. Second, input and output magnitudes serve asymmetric roles: output magnitude directly scales similarity scores while input magnitude modulates training dynamics. Third, magnitude learning benefits asymmetric tasks (text retrieval, RAG) but harms symmetric tasks (STS, text-image alignment). These findings establish a task symmetry principle: the choice between cosine and dot product depends on whether the task has distinct input roles, enabling cost-free improvements by simply removing an unnecessary constraint.

Problem

Research questions and friction points this paper is trying to address.

contrastive learning

embedding magnitude

cosine similarity

dot product

task symmetry

Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding magnitude

contrastive learning

task symmetry

cosine similarity

dot product

🔎 Similar Papers

Metric Space Magnitude for Evaluating the Diversity of Latent Representations

2023-11-27Citations: 1

Authors to Follow