🤖 AI Summary
To address unreliable cross-modal associations and significant modality bias in unsupervised visible-infrared person re-identification (USVI-ReID), this paper proposes a debiased global association framework. Methodologically, it introduces a modality-aware Jaccard distance to mitigate distribution discrepancies between visible and infrared features; designs a split-and-contrast strategy to construct modality-specific prototypes and achieves global instance-level clustering alignment via optimal transport; and jointly optimizes contrastive learning and prototype alignment to learn modality-invariant, identity-discriminative representations. The framework achieves state-of-the-art performance on mainstream benchmarks—including SYSU-MM01 and RegDB—demonstrating substantial improvements in cross-modal matching accuracy and robustness under fully unsupervised settings.
📝 Abstract
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match individuals across visible and infrared cameras without relying on any annotation. Given the significant gap across visible and infrared modality, estimating reliable cross-modality association becomes a major challenge in USVI-ReID. Existing methods usually adopt optimal transport to associate the intra-modality clusters, which is prone to propagating the local cluster errors, and also overlooks global instance-level relations. By mining and attending to the visible-infrared modality bias, this paper focuses on addressing cross-modality learning from two aspects: bias-mitigated global association and modality-invariant representation learning. Motivated by the camera-aware distance rectification in single-modality re-ID, we propose modality-aware Jaccard distance to mitigate the distance bias caused by modality discrepancy, so that more reliable cross-modality associations can be estimated through global clustering. To further improve cross-modality representation learning, a `split-and-contrast' strategy is designed to obtain modality-specific global prototypes. By explicitly aligning these prototypes under global association guidance, modality-invariant yet ID-discriminative representation learning can be achieved. While conceptually simple, our method obtains state-of-the-art performance on benchmark VI-ReID datasets and outperforms existing methods by a significant margin, validating its effectiveness.