Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning

πŸ“… 2026-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work challenges the common practice in contrastive learning of using cosine similarity, which implicitly assumes that embedding norms are noise and discards their potential semantic information. Through a systematic 2Γ—2 ablation study that independently controls input and output normalization in both text and vision encoders, the authors investigate the functional role of embedding norms. They propose a task symmetry principle: preserving norm information significantly improves performance in asymmetric tasks such as text retrieval, but harms performance in symmetric tasks. Furthermore, they reveal an asymmetric functional distinction between input and output norms. By combining controlled normalization, ablation experiments, and Cohen’s d effect size analysis, the study demonstrates that merely removing redundant unit hypersphere constraints at inference yields zero-cost performance gains on dense text retrieval benchmarks.

Technology Category

Application Category

πŸ“ Abstract
Cosine similarity is prevalent in contrastive learning, yet it makes an implicit assumption: embedding magnitude is noise. Prior work occasionally found dot product and cosine similarity comparable, but left unanswered WHAT information magnitude carries, WHEN it helps, and HOW to leverage it. We conduct a systematic study through a $2 \times 2$ ablation that independently controls input-side and output-side normalization across text and vision models. Our findings reveal three key insights. First, in text retrieval, output (document) magnitude strongly correlates with relevance (Cohen's $d$ up to 1.80), yielding the largest gains on reasoning-intensive tasks. Second, input and output magnitudes serve asymmetric roles: output magnitude directly scales similarity scores while input magnitude modulates training dynamics. Third, magnitude learning benefits asymmetric tasks (text retrieval, RAG) but harms symmetric tasks (STS, text-image alignment). These findings establish a task symmetry principle: the choice between cosine and dot product depends on whether the task has distinct input roles, enabling cost-free improvements by simply removing an unnecessary constraint.
Problem

Research questions and friction points this paper is trying to address.

contrastive learning
embedding magnitude
cosine similarity
dot product
task symmetry
Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding magnitude
contrastive learning
task symmetry
cosine similarity
dot product
X
Xincan Feng
Natural Language Processing Laboratory, Nara Institute of Science and Technology, Japan
Taro Watanabe
Taro Watanabe
Nara Institute of Science and Technology
Machine TranslationMachine Learning