See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification

📅 2024-12-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation in cross-camera person re-identification (CC-ReID) caused by clothing variations, this paper proposes a semantic context fusion framework leveraging CLIP’s vision-language representations. The method tackles the problem by jointly modeling identity-invariant cues and clothing-sensitive semantics. Specifically, it introduces: (1) a Semantic Separation Enhancement (SSE) module with two learnable text tokens to explicitly disentangle identity and clothing semantics; and (2) an Orthogonal Text-Guided Visual Interaction Module (SIM), which enforces orthogonality constraints to enable cross-modal feature co-modeling and sharpen discriminative identity representations. Integrating prompt learning, semantic disentanglement, and cross-modal interaction, the framework achieves state-of-the-art results on Market-1501, DukeMTMC-reID, and MSMT17. It effectively mitigates feature ambiguity induced by clothing changes, significantly improving robustness and accuracy in cross-scenario identity matching.

Technology Category

Application Category

📝 Abstract
Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.
Problem

Research questions and friction points this paper is trying to address.

Address cloth-changing challenges in person re-identification
Enhance ID features by disentangling clothing semantics
Improve discriminative power via visual-textual semantic integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages CLIP for visual-textual representation
Uses dual text tokens for semantic separation
Guides visual features with orthogonalized text
🔎 Similar Papers
X
Xiyu Han
Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572025, and also with the Hubei Key Laboratory of Transportation Internet of Things, School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
X
Xian Zhong
Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572025, and also with the Hubei Key Laboratory of Transportation Internet of Things, School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
W
Wenxin Huang
School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China
Xuemei Jia
Xuemei Jia
Wuhan University
Trustworthy AI、Adversarial Attack、Video and image representation
W
Wenxuan Liu
School of Computer Science, Peking University, Beijing 100091, China
Xiaohan Yu
Xiaohan Yu
Macquarie University
computer visionsmart farmingultra-fine-grained visual categorization
A
A.C. Kot
Rapid-Rich Object Search Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798