MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing virtual try-on methods rely on manually annotated human masks, which are prone to annotation errors and entail cumbersome preprocessing. This paper proposes the first end-to-end mask-free virtual try-on framework, generating high-fidelity try-on results from only a single person image and a target garment image. Our method follows a two-stage paradigm: first, we synthesize a high-quality person-garment paired mask dataset using diffusion models; second, we fine-tune a try-on model to enable mask-free end-to-end inference. To enhance generalization, we introduce background-augmented data synthesis and a tailored transfer learning mechanism, significantly improving garment deformation modeling, texture preservation, and overall visual realism. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, comprehensively outperforming all existing mask-dependent approaches.

Technology Category

Application Category

📝 Abstract

Recent advancements in Virtual Try-On (VITON) have significantly improved image realism and garment detail preservation, driven by powerful text-to-image (T2I) diffusion models. However, existing methods often rely on user-provided masks, introducing complexity and performance degradation due to imperfect inputs, as shown in Fig.1(a). To address this, we propose a Mask-Free VITON (MF-VITON) framework that achieves realistic VITON using only a single person image and a target garment, eliminating the requirement for auxiliary masks. Our approach introduces a novel two-stage pipeline: (1) We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios. (2) The pre-trained Mask-based model is fine-tuned on the generated dataset, enabling garment transfer without mask dependencies. This stage simplifies the input requirements while preserving garment texture and shape fidelity. Our framework achieves state-of-the-art (SOTA) performance regarding garment transfer accuracy and visual realism. Notably, the proposed Mask-Free model significantly outperforms existing Mask-based approaches, setting a new benchmark and demonstrating a substantial lead over previous approaches. For more details, visit our project page: https://zhenchenwan.github.io/MF-VITON/.

Problem

Research questions and friction points this paper is trying to address.

Eliminates need for user-provided masks in virtual try-on systems.

Simplifies input to single person image and target garment.

Improves garment transfer accuracy and visual realism.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Eliminates need for user-provided masks

Uses two-stage pipeline for garment transfer

Achieves state-of-the-art visual realism

🔎 Similar Papers

Beyond Imperfections: A Conditional Inpainting Approach for End-to-End Artifact Removal in VTON and Pose Transfer

2024-10-05arXiv.orgCitations: 0

TikTok

San Jose, California

PhD – Generative Models for Closed-loop Synthesis

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)