🤖 AI Summary
To address performance degradation in cross-camera person re-identification (CC-ReID) caused by clothing variations, this paper proposes a semantic context fusion framework leveraging CLIP’s vision-language representations. The method tackles the problem by jointly modeling identity-invariant cues and clothing-sensitive semantics. Specifically, it introduces: (1) a Semantic Separation Enhancement (SSE) module with two learnable text tokens to explicitly disentangle identity and clothing semantics; and (2) an Orthogonal Text-Guided Visual Interaction Module (SIM), which enforces orthogonality constraints to enable cross-modal feature co-modeling and sharpen discriminative identity representations. Integrating prompt learning, semantic disentanglement, and cross-modal interaction, the framework achieves state-of-the-art results on Market-1501, DukeMTMC-reID, and MSMT17. It effectively mitigates feature ambiguity induced by clothing changes, significantly improving robustness and accuracy in cross-scenario identity matching.
📝 Abstract
Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.