🤖 AI Summary
Traditional face recognition models struggle to maintain identity consistency under non-frontal views, occlusions, or missing facial cues, and fail to account for appearance variations such as hairstyle. This work proposes a novel Head Similarity paradigm that extends identity recognition to structured whole-head appearance modeling. By leveraging hierarchical supervision and identity-aware knowledge distillation, the method explicitly captures intra-identity appearance variations and establishes a hierarchical similarity ranking between identity and appearance. Trained on large-scale long-form video data with weak supervision, the approach significantly outperforms existing models across diverse poses, occlusions, and temporal appearance changes, thereby enabling robust identity matching in appearance-sensitive scenarios.
📝 Abstract
Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearance variations such as hairstyle or styling changes into a single representation, limiting their use in appearance-sensitive scenarios. To address this limitation, we introduce Head Similarity, a new formulation that extends identity-centric recognition to structured whole-head similarity modeling. Our approach explicitly captures intra-identity appearance variation and enforces hierarchical similarity ordering across identity and appearance states, enabling meaningful comparison even under occlusion or rear-view conditions. We construct a large-scale benchmark from long-form videos with weakly-supervised appearance states, covering diverse poses, occlusions, and temporal changes. As a first step, we develop a simple yet effective framework that jointly models identity discrimination and appearance-sensitive similarity through hierarchical supervision and identity-aware distillation. Experiments show that conventional face recognition models fail to capture appearance-dependent similarity, while our approach demonstrates the feasibility of structured whole-head similarity modeling.