ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper introduces Omni-Modal Person Re-Identification (OM-ReID), a novel task enabling cross-modal person retrieval under arbitrary query combinations of RGB, infrared, sketch, colored-pencil, and text modalities. To address the limitations of existing methods—namely, restricted modality coverage and insufficient unified modeling capability—the authors: (1) construct ORBench, the first high-quality five-modal benchmark; (2) propose a single-model unified encoder with a dynamic multi-expert routing architecture to achieve modality-agnostic feature alignment and collaborative fusion; and (3) incorporate cross-modal contrastive learning to enhance semantic consistency across heterogeneous modalities. Extensive experiments on ORBench demonstrate that the proposed method significantly outperforms state-of-the-art approaches, achieving robust and efficient retrieval across all possible modality combinations for the first time. The dataset and code are publicly released.

Technology Category

Application Category

📝 Abstract

In real-word scenarios, person re-identification (ReID) expects to identify a person-of-interest via the descriptive query, regardless of whether the query is a single modality or a combination of multiple modalities. However, existing methods and datasets remain constrained to limited modalities, failing to meet this requirement. Therefore, we investigate a new challenging problem called Omni Multi-modal Person Re-identification (OM-ReID), which aims to achieve effective retrieval with varying multi-modal queries. To address dataset scarcity, we construct ORBench, the first high-quality multi-modal dataset comprising 1,000 unique identities across five modalities: RGB, infrared, color pencil, sketch, and textual description. This dataset also has significant superiority in terms of diversity, such as the painting perspectives and textual information. It could serve as an ideal platform for follow-up investigations in OM-ReID. Moreover, we propose ReID5o, a novel multi-modal learning framework for person ReID. It enables synergistic fusion and cross-modal alignment of arbitrary modality combinations in a single model, with a unified encoding and multi-expert routing mechanism proposed. Extensive experiments verify the advancement and practicality of our ORBench. A wide range of possible models have been evaluated and compared on it, and our proposed ReID5o model gives the best performance. The dataset and code will be made publicly available at https://github.com/Zplusdragon/ReID5o_ORBench.

Problem

Research questions and friction points this paper is trying to address.

Achieving multi-modal person re-identification with varying queries

Addressing dataset scarcity for multi-modal ReID research

Enabling unified cross-modal alignment in a single model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs ORBench multi-modal dataset

Proposes ReID5o unified learning framework

Enables cross-modal fusion and alignment

🔎 Similar Papers

No similar papers found.