🤖 AI Summary
This work addresses the limitations of existing virtual try-on methods, which are constrained to garment images as design inputs and struggle to incorporate non-garment visual sources such as artworks or support full outfit editing including accessories. To overcome these challenges, we propose FEAT, a novel framework that enables high-fidelity virtual try-on driven by arbitrary visual references through a content–style disentangled design cue injection mechanism. FEAT innovatively integrates Disentangled Dual Injection (DDI) with Orthogonal Guidance Noise Fusion (OGNF), achieving residual clothing removal and region-adaptive noise control without additional training. Extensive experiments demonstrate that FEAT significantly outperforms current approaches in design flexibility, prompt fidelity, and visual realism, successfully enabling generation of complete outfits with accessories.
📝 Abstract
Fashion design aims to express a designer's creative intent and to depict how garments interact with the human body. Recent methods condition on multimodal inputs to support garment editing and virtual try-on. However, existing methods still (i) confine design to garment-related images, excluding creative design sources such as artwork, abstract imagery, and natural photographs, and (ii) cannot support complete outfits, including accessories. We present FEAT (Fashion Editing And Try-On from Any Design), a method that enables editing and try-on across garments and accessories using diverse design sources. To achieve this, we introduce Disentangled Dual Injection (DDI). It takes both apparel and non-apparel design sources and selectively injects design cues via content and style disentanglement. Furthermore, we propose Orthogonal-Guided Noise Fusion (OGNF), a training-free mechanism that removes residual garments via orthogonal projection and applies region-specific noise strategies to enable virtual try-on for both garments and accessories. Extensive experiments demonstrate that FEAT achieves state-of-the-art performance in design flexibility, prompt consistency, and visual realism.