An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-based person retrieval (TBPR) faces critical challenges including privacy risks, high annotation costs, and limited diversity in real-world data. Method: This paper pioneers a systematic investigation into the feasibility and efficacy of purely synthetic data for TBPR. We propose a dual-path synthetic framework—cross-class generation and intra-class enhancement—that operates entirely without original images. Leveraging automated prompt engineering, diffusion models, and large language models, we jointly generate high-fidelity text–image pairs; additionally, we introduce a noise-robust learning strategy incorporating label correction and consistency regularization. Contribution/Results: We construct the first large-scale, fully automated synthetic TBPR dataset (millions of samples), achieving performance on par with—or surpassing—that of real-data baselines across multiple benchmarks. The code and dataset are publicly released, substantially lowering the barrier to TBPR research and development.

Technology Category

Application Category

📝 Abstract
Data plays a pivotal role in Text-Based Person Retrieval (TBPR) research. Mainstream research paradigm necessitates real-world person images with manual textual annotations for training models, posing privacy-sensitive and labor-intensive issues. Several pioneering efforts explore synthetic data for TBPR but still rely on real data, keeping the aforementioned issues and also resulting in diversity-deficient issue in synthetic datasets, thus impacting TBPR performance. Moreover, these works tend to explore synthetic data for TBPR through limited perspectives, leading to exploration-restricted issue. In this paper, we conduct an empirical study to explore the potential of synthetic data for TBPR, highlighting three key aspects. (1) We propose an inter-class image generation pipeline, in which an automatic prompt construction strategy is introduced to guide generative Artificial Intelligence (AI) models in generating various inter-class images without reliance on original data. (2) We develop an intra-class image augmentation pipeline, in which the generative AI models are applied to further edit the images for obtaining various intra-class images. (3) Building upon the proposed pipelines and an automatic text generation pipeline, we explore the effectiveness of synthetic data in diverse scenarios through extensive experiments. Additionally, we experimentally investigate various noise-robust learning strategies to mitigate the inherent noise in synthetic data. We will release the code, along with the synthetic large-scale dataset generated by our pipelines, which are expected to advance practical TBPR research.
Problem

Research questions and friction points this paper is trying to address.

Exploring synthetic data for Text-Based Person Retrieval (TBPR) to address privacy and labor issues
Overcoming diversity-deficient and exploration-restricted issues in existing synthetic datasets
Investigating noise-robust learning strategies to mitigate inherent noise in synthetic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inter-class image generation with automatic prompts
Intra-class image augmentation using generative AI
Automatic text generation for diverse synthetic data
🔎 Similar Papers
No similar papers found.
M
Min Cao
School of Computer Science and Technology, Soochow University
Z
ZiYin Zeng
School of Computer Science and Technology, Soochow University
Y
YuXin Lu
School of Computer Science and Technology, Soochow University
Mang Ye
Mang Ye
Professor, Wuhan University
Multimodal LearningPerson Re-identificationFederated Learning
Dong Yi
Dong Yi
Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences
Computer VisionPattern Recognition
J
Jinqiao Wang
Wuhan AI Research