LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address the challenge of fragmented, ambiguous, and incomplete witness descriptions in real-world scenarios, this paper introduces Interactive Person Re-Identification (Inter-ReID): a novel task enabling robust cross-camera retrieval via multi-turn vision-language dialogue to dynamically refine textual descriptions. Methodologically, we (1) formally define the Inter-ReID paradigm; (2) construct the first fine-grained, multi-type question-answering dialogue dataset for person retrieval; (3) propose a forward selection supervision strategy that prioritizes questions yielding maximal information gain; and (4) design a LLaVA-based, multi-image-aware QA model that jointly encodes visual features and textual context for conditional question generation, augmented by fine-grained attribute decomposition to guide dialogue modeling. Experiments demonstrate significant improvements over state-of-the-art baselines on both the proposed Inter-ReID benchmark and standard text-based ReID tasks.

Technology Category

Application Category

📝 Abstract

Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.

Problem

Research questions and friction points this paper is trying to address.

Addresses incomplete witness descriptions in person ReID

Introduces interactive dialogue-based person re-identification task

Proposes selective multi-image questioning to refine details

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive dialogue-based person re-identification task

Multi-type question dataset from fine-grained attributes

Looking-forward strategy for informative question prioritization

🔎 Similar Papers

No similar papers found.