Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification

πŸ“… 2025-03-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the scarcity of real RGB-IR paired data, high annotation costs, and privacy constraints in Visible-Infrared Person Re-Identification (VI-ReID), this paper proposes DiVEβ€”the first identity-aware diffusion model for VI-ReID. DiVE decouples identity representations from modality-specific features, enabling controllable cross-modal image synthesis without requiring real paired samples. It innovatively adapts text-to-image diffusion models (e.g., Stable Diffusion) into identity-preserving RGB-IR pair generators, integrating identity clustering-based representation learning and modality-conditional control mechanisms. When integrated with the CAJ model on the LLCM dataset, training on DiVE-synthesized data improves mAP by 9%, substantially alleviating data dependency. This demonstrates both the efficacy and practicality of diffusion-generated synthetic data for VI-ReID.

Technology Category

Application Category

πŸ“ Abstract
The performance of models is intricately linked to the abundance of training data. In Visible-Infrared person Re-IDentification (VI-ReID) tasks, collecting and annotating large-scale images of each individual under various cameras and modalities is tedious, time-expensive, costly and must comply with data protection laws, posing a severe challenge in meeting dataset requirements. Current research investigates the generation of synthetic data as an efficient and privacy-ensuring alternative to collecting real data in the field. However, a specific data synthesis technique tailored for VI-ReID models has yet to be explored. In this paper, we present a novel data generation framework, dubbed Diffusion-based VI-ReID data Expansion (DiVE), that automatically obtain massive RGB-IR paired images with identity preserving by decoupling identity and modality to improve the performance of VI-ReID models. Specifically, identity representation is acquired from a set of samples sharing the same ID, whereas the modality of images is learned by fine-tuning the Stable Diffusion (SD) on modality-specific data. DiVE extend the text-driven image synthesis to identity-preserving RGB-IR multimodal image synthesis. This approach significantly reduces data collection and annotation costs by directly incorporating synthetic data into ReID model training. Experiments have demonstrated that VI-ReID models trained on synthetic data produced by DiVE consistently exhibit notable enhancements. In particular, the state-of-the-art method, CAJ, trained with synthetic images, achieves an improvement of about $9%$ in mAP over the baseline on the LLCM dataset. Code: https://github.com/BorgDiven/DiVE
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of training data in VI-ReID tasks.
Proposes synthetic data generation to reduce collection costs.
Enhances VI-ReID model performance using identity-preserving synthetic images.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based synthetic data generation for VI-ReID
Decouples identity and modality for image synthesis
Reduces data collection and annotation costs significantly
πŸ”Ž Similar Papers
No similar papers found.
W
Wenbo Dai
Nanjing Tech University, Nanjing, China
L
Lijing Lu
Peking University, Beijing, China
Zhihang Li
Zhihang Li
Kwai Inc
Computer VisionGenerative modelvideo/image generationLLM