Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

๐Ÿ“… 2025-11-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

200K/year
๐Ÿค– AI Summary
To address texture incompleteness and poor cross-view consistency in pose-guided human image generation from single-view reference images, this paper proposes a joint conditional diffusion model. Methodologically, it introduces (1) an Appearance Prior Module (APM) that explicitly models multi-view correspondences of identity, color, and texture across poses; and (2) a Joint Conditional Injection (JCI) mechanism that adaptively fuses multi-view features and injects them into the denoising networkโ€”supporting variable numbers of reference inputs while preserving architectural simplicity. Evaluated on standard benchmarks, the method significantly improves visual fidelity and cross-view geometric-appearance consistency of generated images, achieving state-of-the-art performance. It further demonstrates strong generalization capability across diverse poses, identities, and clothing styles.

Technology Category

Application Category

๐Ÿ“ Abstract
Pose-guided human image generation is limited by incomplete textures from single reference views and the absence of explicit cross-view interaction. We present jointly conditioned diffusion model (JCDM), a jointly conditioned diffusion framework that exploits multi-view priors. The appearance prior module (APM) infers a holistic identity preserving prior from incomplete references, and the joint conditional injection (JCI) mechanism fuses multi-view cues and injects shared conditioning into the denoising backbone to align identity, color, and texture across poses. JCDM supports a variable number of reference views and integrates with standard diffusion backbones with minimal and targeted architectural modifications. Experiments demonstrate state of the art fidelity and cross-view consistency.
Problem

Research questions and friction points this paper is trying to address.

Generating person images from incomplete single-view texture references
Addressing lack of explicit cross-view interaction in pose-guided synthesis
Aligning identity, color and texture consistency across multiple poses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly conditioned diffusion framework with multi-view priors
Appearance prior module infers holistic identity from references
Joint conditional injection fuses multi-view cues for alignment
๐Ÿ”Ž Similar Papers
No similar papers found.