Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

In pose-guided portrait image synthesis (PGPIS), latent diffusion models (LDMs) suffer from degradation of fine-grained details—particularly in facial and garment textures—due to aggressive latent-space compression. To address this, we propose the Multi-Focus Conditional Aggregation (MFCA) module, the first to achieve pose-invariant disentanglement and adaptive fusion of identity and texture features. MFCA integrates multi-scale feature disentanglement, conditional feature aggregation, and pose-invariant representation learning within the LDM framework, significantly enhancing structural fidelity at fine-grained levels. Experiments on DeepFashion demonstrate substantial improvements: +12.7% in identity preservation (ID-Retrieval) and −9.3% in perceptual distortion (LPIPS↓), enabling high-fidelity, highly controllable portrait editing. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation and has been widely employed for Pose-Guided Person Image Synthesis (PGPIS), yielding promising results. However, the compression process of LDM often results in the deterioration of details, particularly in sensitive areas such as facial features and clothing textures. In this paper, we propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations by conditioning the model on disentangled, pose-invariant features from these sensitive regions. Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information, enhancing the model's ability to produce appearance realistic and identity-consistent images. Our method demonstrates consistent identity and appearance generation on the DeepFashion dataset and enables flexible person image editing due to its generation consistency. The code is available at https://github.com/jqliu09/mcld.

Problem

Research questions and friction points this paper is trying to address.

Addresses detail loss in Latent Diffusion Model for image synthesis.

Enhances facial and clothing texture preservation in generated images.

Improves identity consistency in pose-guided person image synthesis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-focal Conditioned Latent Diffusion method

Disentangled pose-invariant features integration

Enhanced facial identity and texture generation

🔎 Similar Papers

No similar papers found.