🤖 AI Summary
This work addresses the inefficiency of existing diffusion- or flow-based virtual try-on methods, which rely on multi-step sampling and whose acceleration strategies often overlook the strong structural constraints imposed by conditional inputs. To overcome these limitations, the authors propose a novel one-step paradigm for high-quality image generation that aligns one-step synthesis with conditional transport for the first time. The approach optimizes the sampling trajectory through a purely conditional transport mechanism, augmented by a garment-preserving loss and a self-consistency loss, followed by a dedicated one-step knowledge distillation stage. The resulting method achieves state-of-the-art generation quality while significantly improving inference efficiency, thereby establishing a new standard for efficient virtual try-on.
📝 Abstract
Recent diffusion- and flow-based VTON methods achieve strong results with pretrained generative models, but their reliance on multi-step sampling incurs high inference cost, while existing acceleration methods largely overlook the intrinsic structure of the try-on task. In this paper, we highlight a key observation: VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution. However, limited task-specific data makes training from scratch impractical, forcing existing methods to fine-tune pretrained models whose objectives do not encourage such straight conditional trajectories. Thus, the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself. Motivated by this insight, we encourage straighter VTON sampling trajectories through three targeted modifications: pure conditional transport, a garment preservation loss, and a self consistency loss. We further introduce a one-step distillation stage. Extensive experiments show that our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.