See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address performance degradation in cross-camera person re-identification (CC-ReID) caused by clothing variations, this paper proposes a semantic context fusion framework leveraging CLIP’s vision-language representations. The method tackles the problem by jointly modeling identity-invariant cues and clothing-sensitive semantics. Specifically, it introduces: (1) a Semantic Separation Enhancement (SSE) module with two learnable text tokens to explicitly disentangle identity and clothing semantics; and (2) an Orthogonal Text-Guided Visual Interaction Module (SIM), which enforces orthogonality constraints to enable cross-modal feature co-modeling and sharpen discriminative identity representations. Integrating prompt learning, semantic disentanglement, and cross-modal interaction, the framework achieves state-of-the-art results on Market-1501, DukeMTMC-reID, and MSMT17. It effectively mitigates feature ambiguity induced by clothing changes, significantly improving robustness and accuracy in cross-scenario identity matching.

Technology Category

Application Category

📝 Abstract

Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.

Problem

Research questions and friction points this paper is trying to address.

Address cloth-changing challenges in person re-identification

Enhance ID features by disentangling clothing semantics

Improve discriminative power via visual-textual semantic integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages CLIP for visual-textual representation

Uses dual text tokens for semantic separation

Guides visual features with orthogonalized text

🔎 Similar Papers

Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification