🤖 AI Summary
Existing vision models lack evaluation metrics that accurately capture human sensitivity to identity, hindering progress in tasks such as personalized image generation. To address this limitation, this work proposes ID-Sim—a feedforward identity similarity metric that introduces the first similarity learning framework explicitly designed for identity perception. The method leverages a high-quality training set combining real-world diversity with controllable, synthetically generated data. By employing a deep feedforward network, ID-Sim enables fine-grained modeling of both identity and contextual variables. Evaluated on a newly established unified benchmark, ID-Sim significantly outperforms existing metrics across identity recognition, retrieval, and generation tasks, demonstrating strong alignment with human judgments.
📝 Abstract
Humans have remarkable selective sensitivity to identities -- easily distinguishing between highly similar identities, even across significantly different contexts such as diverse viewpoints or lighting. Vision models have struggled to match this capability, and progress toward identity-focused tasks such as personalized image generation is slowed by a lack of identity-focused evaluation metrics. To help facilitate progress, we propose ID-Sim, a feed-forward metric designed to faithfully reflect human selective sensitivity. To build ID-Sim, we curate a high-quality training set of images spanning diverse real-world domains, augmented with generative synthetic data that provides controlled, fine-grained identity and contextual variations. We evaluate our metric on a new unified evaluation benchmark for assessing consistency with human annotations across identity-focused recognition, retrieval, and generative tasks.