Clothing agnostic Pre-inpainting Virtual Try-ON

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in virtual try-on—namely, inaccurate bottom garment detection, residual clothing contours, and skin reconstruction artifacts during long-to-short sleeve conversion. We propose a diffusion model optimization framework integrating multi-class clothing masks and generative skin completion. Our method introduces a novel pose- and skin-tone-aware skin inpainting module within a two-stage architecture: pre-inpainting followed by re-synthesis. A fine-grained clothing category masking mechanism is incorporated to enhance generalization across diverse garments. Compatible with mainstream diffusion models (e.g., Stable Diffusion), our approach achieves 92.5% short-sleeve synthesis accuracy on the Dress Code benchmark—surpassing Leffa by 15.4%. Visual evaluation confirms substantial improvements in texture fidelity and style consistency. The framework demonstrates strong generalizability and scalability, enabling robust adaptation to varied garment types and poses.

Technology Category

Application Category

📝 Abstract
With the development of deep learning technology, virtual try-on technology has become an important application value in the fields of e-commerce, fashion, and entertainment. The recently proposed Leffa has improved the texture distortion problem of diffu-sion-based models, but there are limitations in that the bottom detection inaccuracy and the existing clothing silhouette remain in the synthesis results. To solve this problem, this study proposes CaP-VTON (Clothing agnostic Pre-inpainting Virtual Try-ON). CaP-VTON has improved the naturalness and consistency of whole-body clothing syn-thesis by integrating multi-category masking based on Dress Code and skin inpainting based on Stable Diffusion. In particular, a generate skin module was introduced to solve the skin restoration problem that occurs when long-sleeved images are converted into short-sleeved or sleeveless ones, and high-quality restoration was implemented consider-ing the human body posture and color. As a result, CaP-VTON recorded 92.5%, which is 15.4% better than Leffa in short-sleeved synthesis accuracy, and showed the performance of consistently reproducing the style and shape of reference clothing in visual evaluation. These structures maintain model-agnostic properties and are applicable to various diffu-sion-based virtual inspection systems, and can contribute to applications that require high-precision virtual wearing, such as e-commerce, custom styling, and avatar creation.
Problem

Research questions and friction points this paper is trying to address.

Improving virtual try-on synthesis accuracy and naturalness
Solving skin restoration issues when changing sleeve lengths
Addressing clothing silhouette preservation and body detection inaccuracies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multi-category masking based on Dress Code
Uses skin inpainting based on Stable Diffusion
Introduces generate skin module for body restoration
🔎 Similar Papers
No similar papers found.
S
Sehyun Kim
Department of Artificial Intelligence and Software, Kangwon National University, Korea
H
Hye Jun Lee
Department of Artificial Intelligence and Software, Kangwon National University, Korea
Jiwoo Lee
Jiwoo Lee
Staff Scientist of Lawrence Livermore National Laboratory
ClimateClimate modelingdiagnostic metricsbig-data visualizationnumerical weather prediction
T
Taemin Lee
Department of Electronic and AI System Engineering, Kangwon National University, Korea