🤖 AI Summary
To address the challenges in visible-infrared person re-identification (VI-ReID)—namely, the susceptibility of appearance features to modality discrepancy and background clutter, and the inaccuracy of existing shape estimation methods in the infrared domain—this paper proposes a human-shape-centric representation learning framework. We introduce a tripartite mechanism: Infrared Shape Refinement (ISR) to correct infrared shape estimation; Shape Feature Propagation (SFP), leveraging graph-based propagation for cross-modal shape feature extraction; and Appearance Feature Enhancement (AFE), incorporating a shape-guided attention module to refine appearance representations. All components operate without auxiliary networks or additional annotations. Extensive experiments on SYSU-MM01, HITSZ-VCM, and RegDB demonstrate state-of-the-art performance: Rank-1/mAP scores of 76.1%/72.6%, 71.2%/52.9%, and 92.4%/86.7%, respectively—surpassing all prior methods at the time of publication.
📝 Abstract
Visible-Infrared Person Re-Identification (VI-ReID) plays a critical role in all-day surveillance systems. However, existing methods primarily focus on learning appearance features while overlooking body shape features, which not only complement appearance features but also exhibit inherent robustness to modality variations. Despite their potential, effectively integrating shape and appearance features remains challenging. Appearance features are highly susceptible to modality variations and background noise, while shape features often suffer from inaccurate infrared shape estimation due to the limitations of auxiliary models. To address these challenges, we propose the Shape-centered Representation Learning (ScRL) framework, which enhances VI-ReID performance by innovatively integrating shape and appearance features. Specifically, we introduce Infrared Shape Restoration (ISR) to restore inaccuracies in infrared body shape representations at the feature level by leveraging infrared appearance features. In addition, we propose Shape Feature Propagation (SFP), which enables the direct extraction of shape features from original images during inference with minimal computational complexity. Furthermore, we design Appearance Feature Enhancement (AFE), which utilizes shape features to emphasize shape-related appearance features while effectively suppressing identity-unrelated noise. Benefiting from the effective integration of shape and appearance features, ScRL demonstrates superior performance through extensive experiments. On the SYSU-MM01, HITSZ-VCM, and RegDB datasets, it achieves Rank-1 (mAP) accuracies of 76.1% (72.6%), 71.2% (52.9%), and 92.4% (86.7%), respectively, surpassing existing state-of-the-art methods.