Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

📅 2024-05-28
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing person re-identification (ReID) research is largely confined to single-scenario settings and lacks cross-task generalization capability. To address this, we propose Instruct-ReID—a novel paradigm that unifies six canonical ReID tasks (image retrieval, cross-camera, cross-modal, cross-domain, cross-temporal, and cross-identity matching) into a unified, instruction-driven retrieval framework. Our key contributions are: (1) the first formal definition of the Instruct-ReID task; (2) OmniReID++, a large-scale, multi-scenario benchmark comprising ten heterogeneous test sets; (3) a dual-model architecture—task-aware IRM and task-agnostic IRM++—integrating multimodal instruction encoding, adaptive triplet loss, and memory-augmented learning; and (4) a dual evaluation protocol. Extensive experiments demonstrate state-of-the-art performance across all OmniReID++ benchmarks. The code, models, and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID
Problem

Research questions and friction points this paper is trying to address.

Develop universal instruction-guided person re-identification (ReID)
Unify diverse ReID tasks via adaptable visual/language instructions
Create scalable benchmark and model for task-specific/free evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces instruct-ReID task for versatile retrieval
Proposes IRM with adaptive triplet loss
Develops IRM++ with memory bank-assisted learning
🔎 Similar Papers
No similar papers found.
W
Weizhen He
College of Electrical Engineering, Zhejiang University, Hangzhou, 310027, China
Y
Yiheng Deng
College of Electrical Engineering, Zhejiang University, Hangzhou, 310027, China
Y
Yunfeng Yan
College of Electrical Engineering, Zhejiang University, Hangzhou, 310027, China
F
Feng Zhu
SenseTime Group Limited, China
Y
Yizhou Wang
Shanghai AI Laboratory, Shanghai, 200232, China
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
Q
Qingsong Xie
Shanghai Jiao Tong University, Shanghai, 200240, China
Donglian Qi
Donglian Qi
Zhejiang University
Power systemsControl
W
Wanli Ouyang
Shanghai AI Laboratory, Shanghai, 200232, China
S
Shixiang Tang
Shanghai AI Laboratory, Shanghai, 200232, China