Instruct-ReID: A Multi-Purpose Person Re-Identification Task with Instructions

๐Ÿ“… 2023-06-13
๐Ÿ›๏ธ Computer Vision and Pattern Recognition
๐Ÿ“ˆ Citations: 17
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing person re-identification (ReID) methods are confined to single-scene settings and exhibit poor generalization across diverse tasks. To address this, we propose an instruction-driven ReID paradigm that unifies support for six heterogeneous tasksโ€”including image/text-based querying, cross-modal matching, and clothing-change robustness. Our key contributions are: (1) the first formal definition of instruction-driven generic ReID; (2) the construction of OmniReID, a large-scale, multi-scenario benchmark covering varied environments and modalities; and (3) an instruction-conditioned retrieval architecture with adaptive triplet loss, enabling zero-shot, task-agnostic generalization. The method operates on RGB inputs only, integrating multimodal feature alignment and instruction-conditioned representation learning. Extensive evaluation across eight benchmarks demonstrates substantial gains: +24.9% mAP for language-instructed ReID, up to +11.2% mAP for clothing-change ReID, and +4.3% mAP for cross-modal ReID.
๐Ÿ“ Abstract
Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Our instruct-ReID is a more general ReID setting, where existing 6 ReID tasks can be viewed as special cases by designing different instructions. We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a baseline method to facilitate research in this new setting. Experimental results show that the proposed multi-purpose ReID model, trained on our OmniReID benchmark without finetuning, can improve +0.5%, +0.6%, +7.7% mAP on Market1501, MSMT17, CUHK03 for traditional ReID, +6.4%, +7.1%, +11.2% mAP on PRCC, VC-Clothes, LTCC for clothes-changing ReID, +11.7% mAP on COCAS+ real2 for clothes template based clothes-changing ReID when using only RGB images, +24.9% mAP on COCAS+ real2 for our newly defined language-instructed ReID, +4.3% on LLCM for visible-infrared ReID, +2.6% on CUHK-PEDES for text-to-image ReID. The datasets, the model, and code are available at https://github.com/hwz-zju/Instruct-ReID.
Problem

Research questions and friction points this paper is trying to address.

Proposes instruct-ReID for multi-scenario person retrieval using visual or language instructions
Introduces OmniReID benchmark and adaptive triplet loss for general ReID settings
Demonstrates improved performance across diverse ReID tasks without fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces instruct-ReID for multi-purpose person retrieval
Proposes OmniReID benchmark and adaptive triplet loss
Enhances performance across diverse ReID tasks significantly
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Weizhen He
Zhejiang University
S
Shixiang Tang
The University of Sydney
Yihe Deng
Yihe Deng
University of California, Los Angeles
Machine LearningNatural Language Processing
Q
Qihao Chen
Liaoning Technical University
Q
Qingsong Xie
Shanghai Jiaotong University
Y
Yizhou Wang
Shanghai AI Laboratory
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
F
Feng Zhu
SenseTime Research
R
Rui Zhao
SenseTime Research
W
Wanli Ouyang
The University of Sydney, Shanghai AI Laboratory
Donglian Qi
Donglian Qi
Zhejiang University
Power systemsControl
Y
Yunfeng Yan
Zhejiang University