MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing virtual try-on methods rely on manually annotated human masks, which are prone to annotation errors and entail cumbersome preprocessing. This paper proposes the first end-to-end mask-free virtual try-on framework, generating high-fidelity try-on results from only a single person image and a target garment image. Our method follows a two-stage paradigm: first, we synthesize a high-quality person-garment paired mask dataset using diffusion models; second, we fine-tune a try-on model to enable mask-free end-to-end inference. To enhance generalization, we introduce background-augmented data synthesis and a tailored transfer learning mechanism, significantly improving garment deformation modeling, texture preservation, and overall visual realism. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, comprehensively outperforming all existing mask-dependent approaches.

Technology Category

Application Category

📝 Abstract
Recent advancements in Virtual Try-On (VITON) have significantly improved image realism and garment detail preservation, driven by powerful text-to-image (T2I) diffusion models. However, existing methods often rely on user-provided masks, introducing complexity and performance degradation due to imperfect inputs, as shown in Fig.1(a). To address this, we propose a Mask-Free VITON (MF-VITON) framework that achieves realistic VITON using only a single person image and a target garment, eliminating the requirement for auxiliary masks. Our approach introduces a novel two-stage pipeline: (1) We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios. (2) The pre-trained Mask-based model is fine-tuned on the generated dataset, enabling garment transfer without mask dependencies. This stage simplifies the input requirements while preserving garment texture and shape fidelity. Our framework achieves state-of-the-art (SOTA) performance regarding garment transfer accuracy and visual realism. Notably, the proposed Mask-Free model significantly outperforms existing Mask-based approaches, setting a new benchmark and demonstrating a substantial lead over previous approaches. For more details, visit our project page: https://zhenchenwan.github.io/MF-VITON/.
Problem

Research questions and friction points this paper is trying to address.

Eliminates need for user-provided masks in virtual try-on systems.
Simplifies input to single person image and target garment.
Improves garment transfer accuracy and visual realism.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Eliminates need for user-provided masks
Uses two-stage pipeline for garment transfer
Achieves state-of-the-art visual realism
🔎 Similar Papers
No similar papers found.
Z
Zhenchen Wan
University of Melbourne, Melbourne, Australia
Y
Yanwu Xu
University of Melbourne, Melbourne, Australia
Dongting Hu
Dongting Hu
University of Melbourne
Computer VisionGenerative AI
W
Weilun Cheng
University of Melbourne, Melbourne, Australia
T
Tianxi Chen
University of Melbourne, Melbourne, Australia
Zhaoqing Wang
Zhaoqing Wang
Columbia University
F
Feng Liu
University of Melbourne, Melbourne, Australia
Tongliang Liu
Tongliang Liu
Director, Sydney AI Centre, University of Sydney & Mohamed bin Zayed University of AI
Machine LearningLearning with Noisy LabelsTrustworthy Machine Learning
Mingming Gong
Mingming Gong
University of Melbourne & Mohamed bin Zayed University of Artificial Intelligence
Causal InferenceMachine LearningComputer Vision