🤖 AI Summary
To address the core challenge of cross-modal representation inconsistency in visible-infrared person re-identification (V-I ReID), this paper proposes a bidirectional multi-step domain generalization framework. The method learns modality-invariant part-level prototypes to construct hierarchical virtual intermediate domains, enabling progressive alignment between visible and infrared features. It introduces two key innovations: (1) a bidirectional multi-step feature refinement mechanism that jointly enhances intra- and inter-modal discriminability, and (2) a part-prototype-driven multi-level domain generation strategy that improves generalization across modalities. Extensive experiments on mainstream V-I ReID benchmarks demonstrate that our approach consistently outperforms existing part-based and single-intermediate-domain methods. Moreover, its modular design is plug-and-play—when integrated into other part-based frameworks, it delivers consistent performance gains. The framework achieves state-of-the-art results while maintaining architectural simplicity and broad compatibility.
📝 Abstract
A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by learning and aligning body part features extracted from both I and V modalities. In particular, our method aims to minimize the cross-modal gap in two steps. First, BMDG aligns modalities in the feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. Based on these prototypes, multiple bridging steps enhance the feature representation. Experiments conducted on V-I ReID datasets indicate that our BMDG approach can outperform state-of-the-art part-based and intermediate generation methods, and can be integrated into other part-based methods to enhance their V-I ReID performance. (Our code is available at:https:/alehdaghi.github.io/BMDG/ )