FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal person re-identification (Re-ID) methods support only fixed cross-modal pairings, limiting flexibility for arbitrary query–gallery combinations and hindering real-world deployment. This paper proposes the first unified Re-ID framework enabling free combinatorial matching across four modalities: RGB, infrared, sketch, and text. Our approach introduces three key innovations: (1) an adaptive Mixture-of-Experts (MoE) architecture that dynamically weights and fuses modality-specific features; (2) a cross-modal query fusion module that achieves query-modality-agnostic feature alignment; and (3) CIRS-PEDES—the first unified benchmark covering seven distinct retrieval modes. Extensive experiments across diverse scenarios demonstrate significant improvements over state-of-the-art methods, with strong generalization across unseen modality combinations and high practical deployability.

Technology Category

Application Category

📝 Abstract
Multimodal person re-identification (Re-ID) aims to match pedestrian images across different modalities. However, most existing methods focus on limited cross-modal settings and fail to support arbitrary query-retrieval combinations, hindering practical deployment. We propose FlexiReID, a flexible framework that supports seven retrieval modes across four modalities: rgb, infrared, sketches, and text. FlexiReID introduces an adaptive mixture-of-experts (MoE) mechanism to dynamically integrate diverse modality features and a cross-modal query fusion module to enhance multimodal feature extraction. To facilitate comprehensive evaluation, we construct CIRS-PEDES, a unified dataset extending four popular Re-ID datasets to include all four modalities. Extensive experiments demonstrate that FlexiReID achieves state-of-the-art performance and offers strong generalization in complex scenarios.
Problem

Research questions and friction points this paper is trying to address.

Supports arbitrary cross-modal query-retrieval combinations
Dynamically integrates diverse modality features adaptively
Enables comprehensive evaluation across four different modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive mixture-of-experts mechanism integrates modality features
Cross-modal query fusion module enhances feature extraction
Unified dataset supports four modalities for evaluation
🔎 Similar Papers
No similar papers found.
Zhen Sun
Zhen Sun
DSA Thrust, HKUST(GZ)
LLM security
L
Lei Tan
National University of Singapore.
Y
Yunhang Shen
Tencent YouTu Lab.
C
Chengmao Cai
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.
Xing Sun
Xing Sun
Tencent Youtu Lab
LLMMLLMAgent
P
Pingyang Dai
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.
L
Liujuan Cao
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.
R
Rongrong Ji
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. Institute of Artificial Intelligence,Xiamen University,Xiamen,China