ODPG: Outfitting Diffusion with Pose Guided Condition

📅 2025-01-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Addressing the challenges of synthesizing highly realistic garments under dynamic poses in virtual try-on (VTON) and the artifacts/information loss caused by conventional explicit garment warping modules, this paper proposes an end-to-end latent diffusion-based framework conditioned on multiple modalities. Methodologically, it eliminates explicit warping by introducing a novel pose-guided multimodal latent feature fusion mechanism. The architecture features three parallel conditional encoders—for garment, pose, and appearance—integrated into a cross-attention-enhanced UNet denoiser, enabling fine-grained texture preservation and precise pose alignment. Quantitative and qualitative evaluations on the FashionTryOn and DeepFashion subsets demonstrate significant improvements in pose consistency and texture fidelity over state-of-the-art GAN- and diffusion-based baselines.

Technology Category

Application Category

📝 Abstract

Virtual Try-On (VTON) technology allows users to visualize how clothes would look on them without physically trying them on, gaining traction with the rise of digitalization and online shopping. Traditional VTON methods, often using Generative Adversarial Networks (GANs) and Diffusion models, face challenges in achieving high realism and handling dynamic poses. This paper introduces Outfitting Diffusion with Pose Guided Condition (ODPG), a novel approach that leverages a latent diffusion model with multiple conditioning inputs during the denoising process. By transforming garment, pose, and appearance images into latent features and integrating these features in a UNet-based denoising model, ODPG achieves non-explicit synthesis of garments on dynamically posed human images. Our experiments on the FashionTryOn and a subset of the DeepFashion dataset demonstrate that ODPG generates realistic VTON images with fine-grained texture details across various poses, utilizing an end-to-end architecture without the need for explicit garment warping processes. Future work will focus on generating VTON outputs in video format and on applying our attention mechanism, as detailed in the Method section, to other domains with limited data.

Problem

Research questions and friction points this paper is trying to address.

Virtual Try-On

High Fidelity

Dynamic Poses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Outfitting Diffusion

Pose Guided Condition

Virtual Try-On

🔎 Similar Papers

3D Human Pose Analysis via Diffusion Synthesis