Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

VI-ReID faces dual challenges: large inter-modal discrepancies and strong style noise (e.g., illumination/color variations), hindering effective feature alignment and weakening identity discriminability. To address these, we propose a semantic-guided feature alignment and disentanglement framework. First, we pioneer the use of diverse textual descriptions to align cross-modal visual features into the CLIP text embedding space. Second, we design a semantic margin constraint to disentangle and suppress modality-specific style information. Third, we introduce a semantic consistency-driven feature restoration module to preserve identity semantics. Our method jointly optimizes multi-granularity image-text alignment, contrastive learning, and margin constraints. Extensive experiments demonstrate state-of-the-art performance on SYSU-MM01, RegDB, and LLCM benchmarks, achieving significant gains in cross-modal matching accuracy and robustness—particularly under low-illumination and complex background conditions.

Technology Category

Application Category

📝 Abstract

Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images, which complicates the alignment of their features into a suitable common space. Moreover, style noise, such as illumination and color contrast, reduces the identity discriminability and modality invariance of features. To address these challenges, we propose a novel Diverse Semantics-guided Feature Alignment and Decoupling (DSFAD) network to align identity-relevant features from different modalities into a textual embedding space and disentangle identity-irrelevant features within each modality. Specifically, we develop a Diverse Semantics-guided Feature Alignment (DSFA) module, which generates pedestrian descriptions with diverse sentence structures to guide the cross-modality alignment of visual features. Furthermore, to filter out style information, we propose a Semantic Margin-guided Feature Decoupling (SMFD) module, which decomposes visual features into pedestrian-related and style-related components, and then constrains the similarity between the former and the textual embeddings to be at least a margin higher than that between the latter and the textual embeddings. Additionally, to prevent the loss of pedestrian semantics during feature decoupling, we design a Semantic Consistency-guided Feature Restitution (SCFR) module, which further excavates useful information for identification from the style-related features and restores it back into the pedestrian-related features, and then constrains the similarity between the features after restitution and the textual embeddings to be consistent with that between the features before decoupling and the textual embeddings. Extensive experiments on three VI-ReID datasets demonstrate the superiority of our DSFAD.

Problem

Research questions and friction points this paper is trying to address.

Align visible-infrared features into common space

Reduce style noise for better identity discriminability

Decouple and restore identity-relevant pedestrian features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns features into textual embedding space

Decouples style and identity features

Restores lost semantics via consistency

🔎 Similar Papers

No similar papers found.