🤖 AI Summary
This work addresses the challenging problem of person re-identification under long-term surveillance, where both cross-modality (visible/infrared) variation and clothing changes jointly degrade performance. To this end, we introduce and formally define a novel task termed Cross-Modality Clothing-Change ReID (CMCC-ReID). We construct SYSU-CMCC, the first real-world benchmark dataset for this setting, and propose a Progressive Identity Alignment (PIA) network. PIA employs a dual-branch decoupling strategy to disentangle identity-relevant features from clothing-specific ones and integrates bidirectional prototype learning to achieve joint intra- and inter-modality contrastive alignment. The proposed approach effectively mitigates the dual heterogeneity induced by modality discrepancy and appearance variation, significantly outperforming existing methods on SYSU-CMCC and establishing a strong baseline for CMCC-ReID research.
📝 Abstract
Person Re-Identification (ReID) faces severe challenges from modality discrepancy and clothing variation in long-term surveillance scenario. While existing studies have made significant progress in either Visible-Infrared ReID (VI-ReID) or Clothing-Change ReID (CC-ReID), real-world surveillance system often face both challenges simultaneously. To address this overlooked yet realistic problem, we define a new task, termed Cross-Modality Clothing-Change Re-Identification (CMCC-ReID), which targets pedestrian matching across variations in both modality and clothing. To advance research in this direction, we construct a new benchmark SYSU-CMCC, where each identity is captured in both visible and infrared domains with distinct outfits, reflecting the dual heterogeneity of long-term surveillance. To tackle CMCC-ReID, we propose a Progressive Identity Alignment Network (PIA) that progressively mitigates the issues of clothing variation and modality discrepancy. Specifically, a Dual-Branch Disentangling Learning (DBDL) module separates identity-related cues from clothing-related factors to achieve clothing-agnostic representation, and a Bi-Directional Prototype Learning (BPL) module performs intra-modality and inter-modality contrast in the embedding space to bridge the modality gap while further suppressing clothing interference. Extensive experiments on the SYSU-CMCC dataset demonstrate that PIA establishes a strong baseline for this new task and significantly outperforms existing methods.