Leveraging Prior Knowledge of Diffusion Model for Person Search

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pedestrian search methods rely on ImageNet-pretrained backbones, limiting their ability to model fine-grained identity cues and complex spatial context; moreover, sharing features between detection and re-identification induces conflicting optimization objectives. To address these limitations, we propose DiffPS, the first framework to integrate diffusion model priors into pedestrian search. DiffPS decouples localization and identification via diffusion-guided region proposal generation, employs a multi-scale frequency optimization module to enhance discriminative texture representation, and introduces semantic-adaptive cross-modal feature aggregation for fine-grained text–vision alignment. Crucially, DiffPS eliminates reliance on shared backbone architectures. Extensive experiments demonstrate state-of-the-art performance on CUHK-SYSU and PRW benchmarks, with significant improvements in challenging scenarios—including heavy occlusion and low-resolution inputs—achieving superior accuracy and robustness.

Technology Category

Application Category

📝 Abstract
Person search aims to jointly perform person detection and re-identification by localizing and identifying a query person within a gallery of uncropped scene images. Existing methods predominantly utilize ImageNet pre-trained backbones, which may be suboptimal for capturing the complex spatial context and fine-grained identity cues necessary for person search. Moreover, they rely on a shared backbone feature for both person detection and re-identification, leading to suboptimal features due to conflicting optimization objectives. In this paper, we propose DiffPS (Diffusion Prior Knowledge for Person Search), a novel framework that leverages a pre-trained diffusion model while eliminating the optimization conflict between two sub-tasks. We analyze key properties of diffusion priors and propose three specialized modules: (i) Diffusion-Guided Region Proposal Network (DGRPN) for enhanced person localization, (ii) Multi-Scale Frequency Refinement Network (MSFRN) to mitigate shape bias, and (iii) Semantic-Adaptive Feature Aggregation Network (SFAN) to leverage text-aligned diffusion features. DiffPS sets a new state-of-the-art on CUHK-SYSU and PRW.
Problem

Research questions and friction points this paper is trying to address.

Enhancing person localization using diffusion-guided region proposals
Mitigating shape bias through multi-scale frequency refinement
Leveraging text-aligned diffusion features for identity recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained diffusion model for person search
Introduces three specialized modules for optimization
Eliminates conflict between detection and re-identification tasks
🔎 Similar Papers
No similar papers found.
G
Giyeol Kim
GSAIM, Chung-Ang University
S
Sooyoung Yang
IPAI, Seoul National University
Jihyong Oh
Jihyong Oh
Assistant Prof. @ Chung-Ang Univ. (CAU), PhD/MS/BS @ KAIST
Computer VisionImage/Video ProcessingDeep LearningGen AI
M
Myungjoo Kang
IPAI, Seoul National University; Department of Mathematical Sciences and RIMS, Seoul National University
Chanho Eom
Chanho Eom
Assistant Professor @Chung-Ang University
Computer VisionMachine LearningArtificial Intelligence