What Matters in Virtual Try-Off? Dual-UNet Diffusion Model For Garment Reconstruction

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the inverse task of virtual try-off—reconstructing canonical garment shapes from in-the-wild wearing images—by proposing a diffusion model framework built upon a Dual-UNet architecture. Through systematic exploration of generative backbones, conditional input formulations, and training strategies, the study presents the first comprehensive analysis of architectural and training trade-offs in virtual try-off. The method leverages a Latent Diffusion Model as its backbone and integrates mask conditioning, attention-guided auxiliary loss, and perceptual loss, augmented with semantic feature fusion and a multi-stage curriculum learning scheme. Evaluated on VITON-HD and DressCode benchmarks, the approach achieves state-of-the-art performance, reducing the DISTS metric by 9.5% and significantly outperforming existing methods across multiple perceptual and generative quality metrics, including LPIPS, FID, KID, and SSIM.

Technology Category

Application Category

📝 Abstract

Virtual Try-On (VTON) has seen rapid advancements, providing a strong foundation for generative fashion tasks. However, the inverse problem, Virtual Try-Off (VTOFF)-aimed at reconstructing the canonical garment from a draped-on image-remains a less understood domain, distinct from the heavily researched field of VTON. In this work, we seek to establish a robust architectural foundation for VTOFF by studying and adapting various diffusion-based strategies from VTON and general Latent Diffusion Models (LDMs). We focus our investigation on the Dual-UNet Diffusion Model architecture and analyze three axes of design: (i) Generation Backbone: comparing Stable Diffusion variants; (ii) Conditioning: ablating different mask designs, masked/unmasked inputs for image conditioning, and the utility of high-level semantic features; and (iii) Losses and Training Strategies: evaluating the impact of the auxiliary attention-based loss, perceptual objectives and multi-stage curriculum schedules. Extensive experiments reveal trade-offs across various configuration options. Evaluated on VITON-HD and DressCode datasets, our framework achieves state-of-the-art performance with a drop of 9.5\% on the primary metric DISTS and competitive performance on LPIPS, FID, KID, and SSIM, providing both stronger baselines and insights to guide future Virtual Try-Off research.

Problem

Research questions and friction points this paper is trying to address.

Virtual Try-Off

Garment Reconstruction

Diffusion Model

Canonical Garment

Inverse Problem

Innovation

Methods, ideas, or system contributions that make the work stand out.

Virtual Try-Off

Dual-UNet Diffusion Model

Garment Reconstruction

Latent Diffusion Models

Conditioning Strategy

🔎 Similar Papers

No similar papers found.