🤖 AI Summary
Visible-light–infrared person re-identification (VI-ReID) suffers from significant modality discrepancy, yet existing methods overemphasize modality-invariant features while neglecting the discriminative value of modality-specific identity cues. To address this, we propose an identity-cue-driven cross-modal learning framework. First, a multi-perceptive feature refinement module explicitly models subtle, often-overlooked modality-specific attributes. Second, a semantic distillation cascaded enhancement mechanism jointly optimizes both modality-invariant and modality-specific representations. Third, an identity-cue-guided loss is introduced to enhance discriminability and diversity in the feature space. Our method integrates shallow-feature aggregation with knowledge distillation strategies. Extensive experiments on standard benchmarks—including SYSU-MM01 and RegDB—demonstrate substantial improvements over state-of-the-art methods, validating the critical role of modality-specific identity knowledge in advancing VI-ReID performance.
📝 Abstract
Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modality-invariant features through unified embedding spaces, they often focus solely on the common discriminative semantics across modalities while disregarding the critical role of modality-specific identity-aware knowledge in discriminative feature learning. To bridge this gap, we propose a novel Identity Clue Refinement and Enhancement (ICRE) network to mine and utilize the implicit discriminative knowledge inherent in modality-specific attributes. Initially, we design a Multi-Perception Feature Refinement (MPFR) module that aggregates shallow features from shared branches, aiming to capture modality-specific attributes that are easily overlooked. Then, we propose a Semantic Distillation Cascade Enhancement (SDCE) module, which distills identity-aware knowledge from the aggregated shallow features and guide the learning of modality-invariant features. Finally, an Identity Clues Guided (ICG) Loss is proposed to alleviate the modality discrepancies within the enhanced features and promote the learning of a diverse representation space. Extensive experiments across multiple public datasets clearly show that our proposed ICRE outperforms existing SOTA methods.