MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world person re-identification (ReID) frequently encounters cross-modal mismatches between query and gallery images (e.g., RGB/NIR/TIR combinations), yet most existing methods assume modality alignment, severely limiting generalizability. To address this, we propose the first any-to-any image-level cross-modal ReID framework. Our approach employs modality-decoupled representation learning to decompose features into shared discriminative components and modality-specific components, enforced by orthogonal and complementary constraints that jointly promote modality-invariant feature extraction and fine-grained cross-modal alignment. Crucially, the framework unifies both modality-matched and modality-mismatched inference scenarios within a single architecture. Extensive experiments on RGBNT201, RGBNT100, and MSVR310 benchmarks demonstrate state-of-the-art performance: our method achieves up to 11.5% absolute mAP improvement under modality-matched settings and an average gain exceeding 10% under modality-mismatched settings.

Technology Category

Application Category

📝 Abstract
Real-world object re-identification (ReID) systems often face modality inconsistencies, where query and gallery images come from different sensors (e.g., RGB, NIR, TIR). However, most existing methods assume modality-matched conditions, which limits their robustness and scalability in practical applications. To address this challenge, we propose MDReID, a flexible any-to-any image-level ReID framework designed to operate under both modality-matched and modality-mismatched scenarios. MDReID builds on the insight that modality information can be decomposed into two components: modality-shared features that are predictable and transferable, and modality-specific features that capture unique, modality-dependent characteristics. To effectively leverage this, MDReID introduces two key components: the Modality Decoupling Learning (MDL) and Modality-aware Metric Learning (MML). Specifically, MDL explicitly decomposes modality features into modality-shared and modality-specific representations, enabling effective retrieval in both modality-aligned and mismatched scenarios. MML, a tailored metric learning strategy, further enforces orthogonality and complementarity between the two components to enhance discriminative power across modalities. Extensive experiments conducted on three challenging multi-modality ReID benchmarks (RGBNT201, RGBNT100, MSVR310) consistently demonstrate the superiority of MDReID. Notably, MDReID achieves significant mAP improvements of 9.8%, 3.0%, and 11.5% in general modality-matched scenarios, and average gains of 3.4%, 11.8%, and 10.9% in modality-mismatched scenarios, respectively. The code is available at: extcolor{magenta}{https://github.com/stone96123/MDReID}.
Problem

Research questions and friction points this paper is trying to address.

Address modality inconsistencies in object re-identification across sensors
Enable robust retrieval in both modality-matched and mismatched scenarios
Decouple modality features into shared and specific representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples modality features into shared and specific components
Enforces orthogonality between modality representations via metric learning
Enables flexible object re-identification across any sensor modalities
🔎 Similar Papers
No similar papers found.
Y
Yingying Feng
Northeastern University
J
Jie Li
Xiamen University
J
Jie Hu
National University of Singapore
Yukang Zhang
Yukang Zhang
Xiamen University
L
Lei Tan
National University of Singapore
Jiayi Ji
Jiayi Ji
Rutgers University