OmniVTON: Training-Free Universal Virtual Try-On

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing virtual try-on (VTON) methods face a fundamental trade-off: supervised in-shop approaches achieve high fidelity but suffer from poor generalization, whereas unsupervised in-the-wild methods exhibit strong adaptability yet are hampered by data bias and conditional coupling. To address this, we propose OmniVTON—the first training-free, universal VTON framework. Its core innovation lies in decoupling garment texture from human pose constraints: structural features are extracted via DDIM inversion; a garment prior generation mechanism is introduced; and a continuous boundary stitching technique is designed to mitigate multi-condition bias inherent in diffusion models. OmniVTON supports cross-domain (in-shop/in-the-wild), cross-dataset, and single-image multi-person try-on. Quantitative and qualitative evaluations demonstrate significant improvements over state-of-the-art methods in fidelity, pose alignment accuracy, and photorealism. OmniVTON thus establishes the first truly zero-shot, highly generalizable virtual try-on solution.

Technology Category

Application Category

📝 Abstract

Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, which ensure high fidelity but struggle with cross-domain generalization, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality. A unified, training-free solution that works across both scenarios remains an open challenge. We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings. To preserve garment details, we introduce a garment prior generation mechanism that aligns clothing with the body, followed by continuous boundary stitching technique to achieve fine-grained texture retention. For precise pose alignment, we utilize DDIM inversion to capture structural cues while suppressing texture interference, ensuring accurate body alignment independent of the original image textures. By disentangling garment and pose constraints, OmniVTON eliminates the bias inherent in diffusion models when handling multiple conditions simultaneously. Experimental results demonstrate that OmniVTON achieves superior performance across diverse datasets, garment types, and application scenarios. Notably, it is the first framework capable of multi-human VTON, enabling realistic garment transfer across multiple individuals in a single scene. Code is available at https://github.com/Jerome-Young/OmniVTON

Problem

Research questions and friction points this paper is trying to address.

Lack of training-free universal virtual try-on solution

Difficulty in cross-domain generalization and data biases

Challenges in preserving garment details and pose alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free universal VTON framework

Garment prior generation and boundary stitching

DDIM inversion for precise pose alignment

🔎 Similar Papers

No similar papers found.