🤖 AI Summary
This work addresses the unclear mechanisms by which different loss functions in metric learning influence embedding geometry and optimization dynamics, due to the absence of a systematic comparative framework. We propose VARIANCE—measuring intra- and inter-class variance—and GREEDINESS—capturing activation sparsity and gradient norm—as diagnostic tools to uniformly evaluate seven mainstream losses (Contrastive, Triplet, N-pair, InfoNCE, ArcFace, SCL, and CCL) across five image retrieval benchmarks. Our analysis reveals a trade-off between efficiency and granularity in preserving embedding diversity, convergence speed, and class structure: Triplet and SCL better maintain intra-class diversity, enhancing fine-grained retrieval performance, whereas Contrastive and InfoNCE accelerate embedding collapse at the cost of oversimplifying class structure. This study establishes the first unified diagnostic framework for principled selection and design of metric learning losses.
📝 Abstract
Metric learning is central to retrieval, yet its effects on embedding geometry and optimization dynamics are not well understood. We introduce a diagnostic framework, VARIANCE (intra-/inter-class variance) and GREEDINESS (active ratio and gradient norms), to compare seven representative losses, i.e., Contrastive, Triplet, N-pair, InfoNCE, ArcFace, SCL, and CCL, across five image-retrieval datasets. Our analysis reveals that Triplet and SCL preserve higher within-class variance and clearer inter-class margins, leading to stronger top-1 retrieval in fine-grained settings. In contrast, Contrastive and InfoNCE compact embeddings are achieved quickly through many small updates, accelerating convergence but potentially oversimplifying class structures. N-pair achieves a large mean separation but with uneven spacing. These insights reveal a form of efficiency-granularity trade-off and provide practical guidance: prefer Triplet/SCL when diversity preservation and hard-sample discrimination are critical, and Contrastive/InfoNCE when faster embedding compaction is desired.