Undress to Redress: A Training-Free Framework for Virtual Try-On

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing virtual try-on (VTON) methods struggle with long-sleeve-to-short-sleeve conversion due to the absence of exposed skin regions in source images, leading to unrealistic skin detail reconstruction. To address this, we propose UR-VTON: a training-free, universal VTON framework introducing the first “remove-then-try-on” two-stage paradigm that explicitly decouples garment removal from re-dressing. Methodologically, UR-VTON integrates dynamic classifier-free guidance (Dynamic CFG) scheduling, a high-frequency feature-driven Structural Refiner for geometric fidelity, and DDPM-based sampling to enhance generation diversity and skin/texture realism. Evaluated on our newly constructed LS-TON benchmark—designed specifically for sleeve-length generalization—UR-VTON achieves state-of-the-art performance in skin authenticity, structural consistency, and cross-sleeve-length generalization. It delivers significantly more credible and photorealistic clothing previews for e-commerce applications.

Technology Category

Application Category

📝 Abstract

Virtual try-on (VTON) is a crucial task for enhancing user experience in online shopping by generating realistic garment previews on personal photos. Although existing methods have achieved impressive results, they struggle with long-sleeve-to-short-sleeve conversions-a common and practical scenario-often producing unrealistic outputs when exposed skin is underrepresented in the original image. We argue that this challenge arises from the ''majority'' completion rule in current VTON models, which leads to inaccurate skin restoration in such cases. To address this, we propose UR-VTON (Undress-Redress Virtual Try-ON), a novel, training-free framework that can be seamlessly integrated with any existing VTON method. UR-VTON introduces an ''undress-to-redress'' mechanism: it first reveals the user's torso by virtually ''undressing,'' then applies the target short-sleeve garment, effectively decomposing the conversion into two more manageable steps. Additionally, we incorporate Dynamic Classifier-Free Guidance scheduling to balance diversity and image quality during DDPM sampling, and employ Structural Refiner to enhance detail fidelity using high-frequency cues. Finally, we present LS-TON, a new benchmark for long-sleeve-to-short-sleeve try-on. Extensive experiments demonstrate that UR-VTON outperforms state-of-the-art methods in both detail preservation and image quality. Code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Addresses unrealistic outputs in long-sleeve-to-short-sleeve virtual try-on conversions

Proposes a training-free framework integrating undress-to-redress mechanism

Introduces new benchmark for long-sleeve-to-short-sleeve try-on scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for virtual try-on

Undress-to-redress mechanism for skin restoration

Dynamic guidance scheduling for image quality

🔎 Similar Papers

No similar papers found.

Netflix

The overall market range for Netflix Internships is typically $40/hour - $110/hour.

Los Gatos, CA, USA / Los Angeles, CA, USA

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)