🤖 AI Summary
Existing virtual try-on (VTON) methods struggle with long-sleeve-to-short-sleeve conversion due to the absence of exposed skin regions in source images, leading to unrealistic skin detail reconstruction. To address this, we propose UR-VTON: a training-free, universal VTON framework introducing the first “remove-then-try-on” two-stage paradigm that explicitly decouples garment removal from re-dressing. Methodologically, UR-VTON integrates dynamic classifier-free guidance (Dynamic CFG) scheduling, a high-frequency feature-driven Structural Refiner for geometric fidelity, and DDPM-based sampling to enhance generation diversity and skin/texture realism. Evaluated on our newly constructed LS-TON benchmark—designed specifically for sleeve-length generalization—UR-VTON achieves state-of-the-art performance in skin authenticity, structural consistency, and cross-sleeve-length generalization. It delivers significantly more credible and photorealistic clothing previews for e-commerce applications.
📝 Abstract
Virtual try-on (VTON) is a crucial task for enhancing user experience in online shopping by generating realistic garment previews on personal photos. Although existing methods have achieved impressive results, they struggle with long-sleeve-to-short-sleeve conversions-a common and practical scenario-often producing unrealistic outputs when exposed skin is underrepresented in the original image. We argue that this challenge arises from the ''majority'' completion rule in current VTON models, which leads to inaccurate skin restoration in such cases. To address this, we propose UR-VTON (Undress-Redress Virtual Try-ON), a novel, training-free framework that can be seamlessly integrated with any existing VTON method. UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing,'' then applies the target short-sleeve garment, effectively decomposing the conversion into two more manageable steps. Additionally, we incorporate Dynamic Classifier-Free Guidance scheduling to balance diversity and image quality during DDPM sampling, and employ Structural Refiner to enhance detail fidelity using high-frequency cues. Finally, we present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on. Extensive experiments demonstrate that UR-VTON outperforms state-of-the-art methods in both detail preservation and image quality. Code will be released upon acceptance.