FitPro: A Zero-Shot Framework for Interactive Text-based Pedestrian Retrieval in Open World

πŸ“… 2025-09-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Text-based person retrieval (TPR) suffers from poor generalization and shallow semantic understanding in open-world interactive scenarios. To address these challenges, we propose FitProβ€”a zero-shot framework featuring three core innovations: (i) contrastive feature decoding to mitigate semantic drift; (ii) incremental semantic mining for global modeling across multi-turn interactions; and (iii) query-aware hierarchical retrieval to enhance viewpoint robustness. FitPro integrates prompt-guided decoding, multi-view feature alignment, and a dynamic retrieval pipeline, enabling natural-language-driven, cross-scenario zero-shot transfer. Evaluated on five public benchmarks, FitPro consistently outperforms state-of-the-art methods by significant margins, demonstrating superior generalization capability and practical deployability.

Technology Category

Application Category

πŸ“ Abstract
Text-based Pedestrian Retrieval (TPR) aims to retrieve specific target pedestrians in visual scenes according to natural language descriptions. Although existing methods have achieved progress under constrained settings, interactive retrieval in the open-world scenario still suffers from limited model generalization and insufficient semantic understanding. To address these challenges, we propose FitPro, an open-world interactive zero-shot TPR framework with enhanced semantic comprehension and cross-scene adaptability. FitPro has three innovative components: Feature Contrastive Decoding (FCD), Incremental Semantic Mining (ISM), and Query-aware Hierarchical Retrieval (QHR). The FCD integrates prompt-guided contrastive decoding to generate high-quality structured pedestrian descriptions from denoised images, effectively alleviating semantic drift in zero-shot scenarios. The ISM constructs holistic pedestrian representations from multi-view observations to achieve global semantic modeling in multi-turn interactions,thereby improving robustness against viewpoint shifts and fine-grained variations in descriptions. The QHR dynamically optimizes the retrieval pipeline according to query types, enabling efficient adaptation to multi-modal and multi-view inputs. Extensive experiments on five public datasets and two evaluation protocols demonstrate that FitPro significantly overcomes the generalization limitations and semantic modeling constraints of existing methods in interactive retrieval, paving the way for practical deployment. The code and data will be released at https://github.com/ lilo4096/FitPro-Interactive-Person-Retrieval.
Problem

Research questions and friction points this paper is trying to address.

Addressing limited model generalization in open-world interactive pedestrian retrieval
Overcoming insufficient semantic understanding in zero-shot text-based person search
Enhancing cross-scene adaptability for multi-view and multi-modal pedestrian retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Contrastive Decoding generates structured descriptions from denoised images
Incremental Semantic Mining constructs holistic representations from multi-view observations
Query-aware Hierarchical Retrieval dynamically optimizes pipeline for query types
πŸ”Ž Similar Papers
Z
Zengli Luo
Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China; School of Basic Medical Sciences, Guangzhou University of Chinese Medicine, Guangzhou, China
C
Canlong Zhang
Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, China; Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
X
Xiaochun Lu
Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, China; Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
Zhixin Li
Zhixin Li
Syracuse University School of Information Studies
Social MachinesChildrenNon-human Relationship