🤖 AI Summary
Deep metric learning (DML) suffers from limited robustness to intra-class variability and label noise. To address this, we propose a continuous potential-field-based DML framework: each sample embedding is modeled as a differentiable attractive/repulsive potential field; superposition of these fields constructs a global semantic potential field that explicitly optimizes intra-class compactness and inter-class separation. We introduce, for the first time, a distance-adaptive potential decay mechanism and incorporate learnable proxy points to represent subpopulations, enabling end-to-end joint optimization. Our method achieves state-of-the-art performance on three major benchmarks—Cars-196, CUB-200-2011, and Stanford Online Products—while demonstrating significantly improved robustness to intra-class variation and label noise. This work establishes a novel continuous energy-based modeling paradigm for DML, advancing both theoretical formulation and practical resilience.
📝 Abstract
Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model that instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance, we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large intra-class variations and label noise. Like other proxy-based methods, we also use proxies to succinctly represent sub-populations of examples. We evaluate our method on three standard DML benchmarks- Cars-196, CUB-200-2011, and SOP datasets where it outperforms state-of-the-art baselines.