🤖 AI Summary
Real-world person re-identification (ReID) frequently encounters cross-modal mismatches between query and gallery images (e.g., RGB/NIR/TIR combinations), yet most existing methods assume modality alignment, severely limiting generalizability. To address this, we propose the first any-to-any image-level cross-modal ReID framework. Our approach employs modality-decoupled representation learning to decompose features into shared discriminative components and modality-specific components, enforced by orthogonal and complementary constraints that jointly promote modality-invariant feature extraction and fine-grained cross-modal alignment. Crucially, the framework unifies both modality-matched and modality-mismatched inference scenarios within a single architecture. Extensive experiments on RGBNT201, RGBNT100, and MSVR310 benchmarks demonstrate state-of-the-art performance: our method achieves up to 11.5% absolute mAP improvement under modality-matched settings and an average gain exceeding 10% under modality-mismatched settings.
📝 Abstract
Real-world object re-identification (ReID) systems often face modality inconsistencies, where query and gallery images come from different sensors (e.g., RGB, NIR, TIR). However, most existing methods assume modality-matched conditions, which limits their robustness and scalability in practical applications. To address this challenge, we propose MDReID, a flexible any-to-any image-level ReID framework designed to operate under both modality-matched and modality-mismatched scenarios. MDReID builds on the insight that modality information can be decomposed into two components: modality-shared features that are predictable and transferable, and modality-specific features that capture unique, modality-dependent characteristics. To effectively leverage this, MDReID introduces two key components: the Modality Decoupling Learning (MDL) and Modality-aware Metric Learning (MML). Specifically, MDL explicitly decomposes modality features into modality-shared and modality-specific representations, enabling effective retrieval in both modality-aligned and mismatched scenarios. MML, a tailored metric learning strategy, further enforces orthogonality and complementarity between the two components to enhance discriminative power across modalities. Extensive experiments conducted on three challenging multi-modality ReID benchmarks (RGBNT201, RGBNT100, MSVR310) consistently demonstrate the superiority of MDReID. Notably, MDReID achieves significant mAP improvements of 9.8%, 3.0%, and 11.5% in general modality-matched scenarios, and average gains of 3.4%, 11.8%, and 10.9% in modality-mismatched scenarios, respectively. The code is available at: extcolor{magenta}{https://github.com/stone96123/MDReID}.