Clothing agnostic Pre-inpainting Virtual Try-ON

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses key challenges in virtual try-on—namely, inaccurate bottom garment detection, residual clothing contours, and skin reconstruction artifacts during long-to-short sleeve conversion. We propose a diffusion model optimization framework integrating multi-class clothing masks and generative skin completion. Our method introduces a novel pose- and skin-tone-aware skin inpainting module within a two-stage architecture: pre-inpainting followed by re-synthesis. A fine-grained clothing category masking mechanism is incorporated to enhance generalization across diverse garments. Compatible with mainstream diffusion models (e.g., Stable Diffusion), our approach achieves 92.5% short-sleeve synthesis accuracy on the Dress Code benchmark—surpassing Leffa by 15.4%. Visual evaluation confirms substantial improvements in texture fidelity and style consistency. The framework demonstrates strong generalizability and scalability, enabling robust adaptation to varied garment types and poses.

Technology Category

Application Category

📝 Abstract

With the development of deep learning technology, virtual try-on technology has become an important application value in the fields of e-commerce, fashion, and entertainment. The recently proposed Leffa has improved the texture distortion problem of diffu-sion-based models, but there are limitations in that the bottom detection inaccuracy and the existing clothing silhouette remain in the synthesis results. To solve this problem, this study proposes CaP-VTON (Clothing agnostic Pre-inpainting Virtual Try-ON). CaP-VTON has improved the naturalness and consistency of whole-body clothing syn-thesis by integrating multi-category masking based on Dress Code and skin inpainting based on Stable Diffusion. In particular, a generate skin module was introduced to solve the skin restoration problem that occurs when long-sleeved images are converted into short-sleeved or sleeveless ones, and high-quality restoration was implemented consider-ing the human body posture and color. As a result, CaP-VTON recorded 92.5%, which is 15.4% better than Leffa in short-sleeved synthesis accuracy, and showed the performance of consistently reproducing the style and shape of reference clothing in visual evaluation. These structures maintain model-agnostic properties and are applicable to various diffu-sion-based virtual inspection systems, and can contribute to applications that require high-precision virtual wearing, such as e-commerce, custom styling, and avatar creation.

Problem

Research questions and friction points this paper is trying to address.

Improving virtual try-on synthesis accuracy and naturalness

Solving skin restoration issues when changing sleeve lengths

Addressing clothing silhouette preservation and body detection inaccuracies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multi-category masking based on Dress Code

Uses skin inpainting based on Stable Diffusion

Introduces generate skin module for body restoration

🔎 Similar Papers

Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

2024-10-05arXiv.orgCitations: 0