LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of fragmented, ambiguous, and incomplete witness descriptions in real-world scenarios, this paper introduces Interactive Person Re-Identification (Inter-ReID): a novel task enabling robust cross-camera retrieval via multi-turn vision-language dialogue to dynamically refine textual descriptions. Methodologically, we (1) formally define the Inter-ReID paradigm; (2) construct the first fine-grained, multi-type question-answering dialogue dataset for person retrieval; (3) propose a forward selection supervision strategy that prioritizes questions yielding maximal information gain; and (4) design a LLaVA-based, multi-image-aware QA model that jointly encodes visual features and textual context for conditional question generation, augmented by fine-grained attribute decomposition to guide dialogue modeling. Experiments demonstrate significant improvements over state-of-the-art baselines on both the proposed Inter-ReID benchmark and standard text-based ReID tasks.

Technology Category

Application Category

📝 Abstract
Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses incomplete witness descriptions in person ReID
Introduces interactive dialogue-based person re-identification task
Proposes selective multi-image questioning to refine details
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive dialogue-based person re-identification task
Multi-type question dataset from fine-grained attributes
Looking-forward strategy for informative question prioritization
🔎 Similar Papers
No similar papers found.
Y
Yiding Lu
College of Computer Science, Sichuan University, China
Mouxing Yang
Mouxing Yang
Sichuan University
Multi-modalMulti-viewNoisy Correspondence
Dezhong Peng
Dezhong Peng
Sichuan University
Multi-modal LearningMultimedia AnalysisNeural Network
P
Peng Hu
College of Computer Science, Sichuan University, China
Y
Yijie Lin
College of Computer Science, Sichuan University, China
X
Xi Peng
College of Computer Science, Sichuan University, China; National Key Laboratory of Fundamental Algorithms and Models for Engineering Numerical Simulation, Sichuan University, China