Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification

📅 2025-05-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
VI-ReID faces dual challenges: large inter-modal discrepancies and strong style noise (e.g., illumination/color variations), hindering effective feature alignment and weakening identity discriminability. To address these, we propose a semantic-guided feature alignment and disentanglement framework. First, we pioneer the use of diverse textual descriptions to align cross-modal visual features into the CLIP text embedding space. Second, we design a semantic margin constraint to disentangle and suppress modality-specific style information. Third, we introduce a semantic consistency-driven feature restoration module to preserve identity semantics. Our method jointly optimizes multi-granularity image-text alignment, contrastive learning, and margin constraints. Extensive experiments demonstrate state-of-the-art performance on SYSU-MM01, RegDB, and LLCM benchmarks, achieving significant gains in cross-modal matching accuracy and robustness—particularly under low-illumination and complex background conditions.

Technology Category

Application Category

📝 Abstract
Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images, which complicates the alignment of their features into a suitable common space. Moreover, style noise, such as illumination and color contrast, reduces the identity discriminability and modality invariance of features. To address these challenges, we propose a novel Diverse Semantics-guided Feature Alignment and Decoupling (DSFAD) network to align identity-relevant features from different modalities into a textual embedding space and disentangle identity-irrelevant features within each modality. Specifically, we develop a Diverse Semantics-guided Feature Alignment (DSFA) module, which generates pedestrian descriptions with diverse sentence structures to guide the cross-modality alignment of visual features. Furthermore, to filter out style information, we propose a Semantic Margin-guided Feature Decoupling (SMFD) module, which decomposes visual features into pedestrian-related and style-related components, and then constrains the similarity between the former and the textual embeddings to be at least a margin higher than that between the latter and the textual embeddings. Additionally, to prevent the loss of pedestrian semantics during feature decoupling, we design a Semantic Consistency-guided Feature Restitution (SCFR) module, which further excavates useful information for identification from the style-related features and restores it back into the pedestrian-related features, and then constrains the similarity between the features after restitution and the textual embeddings to be consistent with that between the features before decoupling and the textual embeddings. Extensive experiments on three VI-ReID datasets demonstrate the superiority of our DSFAD.
Problem

Research questions and friction points this paper is trying to address.

Align visible-infrared features into common space
Reduce style noise for better identity discriminability
Decouple and restore identity-relevant pedestrian features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns features into textual embedding space
Decouples style and identity features
Restores lost semantics via consistency
🔎 Similar Papers
No similar papers found.
Neng Dong
Neng Dong
Nanjing University of Science and Technology
S
Shuanglin Yan
Nanjing University of Science and Technology
L
Liyan Zhang
Nanjing University of Aeronautics and Astronautics
J
Jinhui Tang
Nanjing University of Science and Technology