🤖 AI Summary
Existing virtual try-on methods rely on manually annotated human masks, which are prone to annotation errors and entail cumbersome preprocessing. This paper proposes the first end-to-end mask-free virtual try-on framework, generating high-fidelity try-on results from only a single person image and a target garment image. Our method follows a two-stage paradigm: first, we synthesize a high-quality person-garment paired mask dataset using diffusion models; second, we fine-tune a try-on model to enable mask-free end-to-end inference. To enhance generalization, we introduce background-augmented data synthesis and a tailored transfer learning mechanism, significantly improving garment deformation modeling, texture preservation, and overall visual realism. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, comprehensively outperforming all existing mask-dependent approaches.
📝 Abstract
Recent advancements in Virtual Try-On (VITON) have significantly improved image realism and garment detail preservation, driven by powerful text-to-image (T2I) diffusion models. However, existing methods often rely on user-provided masks, introducing complexity and performance degradation due to imperfect inputs, as shown in Fig.1(a). To address this, we propose a Mask-Free VITON (MF-VITON) framework that achieves realistic VITON using only a single person image and a target garment, eliminating the requirement for auxiliary masks. Our approach introduces a novel two-stage pipeline: (1) We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios. (2) The pre-trained Mask-based model is fine-tuned on the generated dataset, enabling garment transfer without mask dependencies. This stage simplifies the input requirements while preserving garment texture and shape fidelity. Our framework achieves state-of-the-art (SOTA) performance regarding garment transfer accuracy and visual realism. Notably, the proposed Mask-Free model significantly outperforms existing Mask-based approaches, setting a new benchmark and demonstrating a substantial lead over previous approaches. For more details, visit our project page: https://zhenchenwan.github.io/MF-VITON/.