ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LVLM-based person re-identification (Re-ID) methods suffer from rigid interaction and poor generalization due to reliance on fixed textual templates and non-VQA-style reasoning. This paper introduces the first open-ended interactive Re-ID framework, enabling multimodal natural language question-answering (VQA) queries grounded in both text and images—thereby breaking free from template-constrained prompting. Our core contributions are: (1) a Hierarchical Progressive Tuning (HPT) strategy that enables stepwise discrimination—from coarse-grained attributes to fine-grained identity; and (2) an Re-ID–specific LVLM fine-tuning paradigm integrating multimodal alignment, parameter-efficient adaptation, and dynamic prompt evolution. Evaluated across four distinct settings and ten benchmark datasets, our method consistently surpasses state-of-the-art approaches, delivering significant improvements in retrieval accuracy, interactive flexibility, and deployment practicality.

Technology Category

Application Category

📝 Abstract
Person re-identification (Re-ID) is a critical task in human-centric intelligent systems, enabling consistent identification of individuals across different camera views using multi-modal query information. Recent studies have successfully integrated LVLMs with person Re-ID, yielding promising results. However, existing LVLM-based methods face several limitations. They rely on extracting textual embeddings from fixed templates, which are used either as intermediate features for image representation or for prompt tuning in domain-specific tasks. Furthermore, they are unable to adopt the VQA inference format, significantly restricting their broader applicability. In this paper, we propose a novel, versatile, one-for-all person Re-ID framework, ChatReID. Our approach introduces a Hierarchical Progressive Tuning (HPT) strategy, which ensures fine-grained identity-level retrieval by progressively refining the model's ability to distinguish pedestrian identities. Extensive experiments demonstrate that our approach outperforms SOTA methods across ten benchmarks in four different Re-ID settings, offering enhanced flexibility and user-friendliness. ChatReID provides a scalable, practical solution for real-world person Re-ID applications, enabling effective multi-modal interaction and fine-grained identity discrimination.
Problem

Research questions and friction points this paper is trying to address.

Enhancing person re-identification accuracy
Overcoming fixed template limitations
Improving VQA inference adaptability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Progressive Tuning strategy
Multi-modal interaction enhancement
Fine-grained identity discrimination
🔎 Similar Papers
No similar papers found.