🤖 AI Summary
Existing virtual try-on (VTON) and virtual try-off (VTOFF) studies operate in isolation, overlooking the symmetric and complementary nature of garment–human relationships. This work is the first to formulate VTON and VTOFF as a unified bidirectional image translation task. We propose the first diffusion-based framework for joint VTON/VTOFF: it employs bidirectional feature disentanglement to enable mask-guided VTON and mask-free VTOFF in synergy; introduces dual conditional guidance—operating simultaneously in latent and pixel spaces—and adopts a phased training paradigm to mitigate asymmetric mask dependency across modalities. Evaluated on DressCode and VITON-HD, our method achieves significant improvements in bidirectional generation quality. Both quantitative metrics and qualitative analysis demonstrate superior performance over state-of-the-art methods, validating the effectiveness of symmetric modeling and the advancement of our unified framework.
📝 Abstract
While recent advances in virtual try-on (VTON) have achieved realistic garment transfer to human subjects, its inverse task, virtual try-off (VTOFF), which aims to reconstruct canonical garment templates from dressed humans, remains critically underexplored and lacks systematic investigation. Existing works predominantly treat them as isolated tasks: VTON focuses on garment dressing while VTOFF addresses garment extraction, thereby neglecting their complementary symmetry. To bridge this fundamental gap, we propose the Two-Way Garment Transfer Model (TWGTM), to the best of our knowledge, the first unified framework for joint clothing-centric image synthesis that simultaneously resolves both mask-guided VTON and mask-free VTOFF through bidirectional feature disentanglement. Specifically, our framework employs dual-conditioned guidance from both latent and pixel spaces of reference images to seamlessly bridge the dual tasks. On the other hand, to resolve the inherent mask dependency asymmetry between mask-guided VTON and mask-free VTOFF, we devise a phased training paradigm that progressively bridges this modality gap. Extensive qualitative and quantitative experiments conducted across the DressCode and VITON-HD datasets validate the efficacy and competitive edge of our proposed approach.