🤖 AI Summary
To address pervasive visual artifacts—such as distortion, misalignment, and blurring—in virtual try-on (VTON) and pose transfer, this paper proposes the first end-to-end conditional inpainting framework for automatic artifact detection and high-fidelity removal. Methodologically, it integrates semantic-guided mask prediction, a multi-scale feature fusion inpainting network, and a conditional diffusion model to jointly reconstruct corrupted regions with structural and textural coherence. Key contributions include: (1) the first fine-grained mask-annotated dataset specifically designed for artifact removal in VTON and pose transfer; and (2) the first end-to-end trainable inpainting architecture tailored to this task. Extensive experiments demonstrate state-of-the-art performance in quantitative metrics (PSNR, LPIPS) and user studies, significantly improving image fidelity and perceptual naturalness over prior methods.
📝 Abstract
Artifacts often degrade the visual quality of virtual try-on (VTON) and pose transfer applications, impacting user experience. This study introduces a novel conditional inpainting technique designed to detect and remove such distortions, improving image aesthetics. Our work is the first to present an end-to-end framework addressing this specific issue, and we developed a specialized dataset of artifacts in VTON and pose transfer tasks, complete with masks highlighting the affected areas. Experimental results show that our method not only effectively removes artifacts but also significantly enhances the visual quality of the final images, setting a new benchmark in computer vision and image processing.