๐ค AI Summary
Unsupervised visible-infrared person re-identification (UVI-ReID) confronts dual challenges: cross-modal heterogeneity and pseudo-label noiseโincluding noise overfitting, error accumulation, and inter-cluster mismatch. To address these, we propose RoDE, a robust dual-learning framework that innovatively integrates adaptive robust anti-noise learning (RAL), alternating dual training, and cluster-consistent matching (CCM). RoDE explicitly models and suppresses pseudo-label noise via dynamic sample reweighting, alternating self-training between two complementary models, and similarity-driven cross-modal cluster alignment. Extensive experiments on SYSU-MM01, RegDB, and LLVIP benchmarks demonstrate state-of-the-art performance, with mAP improvements of up to 6.2% over prior methods. Ablation studies confirm the effectiveness of each component in enhancing noise robustness and cross-modal generalization.
๐ Abstract
Unsupervised visible-infrared person re-identification (UVI-ReID) aims at retrieving pedestrian images of the same individual across distinct modalities, presenting challenges due to the inherent heterogeneity gap and the absence of cost-prohibitive annotations. Although existing methods employ self-training with clustering-generated pseudo-labels to bridge this gap, they always implicitly assume that these pseudo-labels are predicted correctly. In practice, however, this presumption is impossible to satisfy due to the difficulty of training a perfect model let alone without any ground truths, resulting in pseudo-labeling errors. Based on the observation, this study introduces a new learning paradigm for UVI-ReID considering Pseudo-Label Noise (PLN), which encompasses three challenges: noise overfitting, error accumulation, and noisy cluster correspondence. To conquer these challenges, we propose a novel robust duality learning framework (RoDE) for UVI-ReID to mitigate the adverse impact of noisy pseudo-labels. Specifically, for noise overfitting, we propose a novel Robust Adaptive Learning mechanism (RAL) to dynamically prioritize clean samples while deprioritizing noisy ones, thus avoiding overemphasizing noise. To circumvent error accumulation of self-training, where the model tends to confirm its mistakes, RoDE alternately trains dual distinct models using pseudo-labels predicted by their counterparts, thereby maintaining diversity and avoiding collapse into noise. However, this will lead to cross-cluster misalignment between the two distinct models, not to mention the misalignment between different modalities, resulting in dual noisy cluster correspondence and thus difficult to optimize. To address this issue, a Cluster Consistency Matching mechanism (CCM) is presented to ensure reliable alignment across distinct modalities as well as across different models by leveraging cross-cluster similarities. Extensive experiments on three benchmark datasets demonstrate the effectiveness of the proposed RoDE.