OmniVTON: Training-Free Universal Virtual Try-On

πŸ“… 2025-07-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing virtual try-on (VTON) methods face a fundamental trade-off: supervised in-shop approaches achieve high fidelity but suffer from poor generalization, whereas unsupervised in-the-wild methods exhibit strong adaptability yet are hampered by data bias and conditional coupling. To address this, we propose OmniVTONβ€”the first training-free, universal VTON framework. Its core innovation lies in decoupling garment texture from human pose constraints: structural features are extracted via DDIM inversion; a garment prior generation mechanism is introduced; and a continuous boundary stitching technique is designed to mitigate multi-condition bias inherent in diffusion models. OmniVTON supports cross-domain (in-shop/in-the-wild), cross-dataset, and single-image multi-person try-on. Quantitative and qualitative evaluations demonstrate significant improvements over state-of-the-art methods in fidelity, pose alignment accuracy, and photorealism. OmniVTON thus establishes the first truly zero-shot, highly generalizable virtual try-on solution.

Technology Category

Application Category

πŸ“ Abstract
Image-based Virtual Try-On (VTON) techniques rely on either supervised in-shop approaches, which ensure high fidelity but struggle with cross-domain generalization, or unsupervised in-the-wild methods, which improve adaptability but remain constrained by data biases and limited universality. A unified, training-free solution that works across both scenarios remains an open challenge. We propose OmniVTON, the first training-free universal VTON framework that decouples garment and pose conditioning to achieve both texture fidelity and pose consistency across diverse settings. To preserve garment details, we introduce a garment prior generation mechanism that aligns clothing with the body, followed by continuous boundary stitching technique to achieve fine-grained texture retention. For precise pose alignment, we utilize DDIM inversion to capture structural cues while suppressing texture interference, ensuring accurate body alignment independent of the original image textures. By disentangling garment and pose constraints, OmniVTON eliminates the bias inherent in diffusion models when handling multiple conditions simultaneously. Experimental results demonstrate that OmniVTON achieves superior performance across diverse datasets, garment types, and application scenarios. Notably, it is the first framework capable of multi-human VTON, enabling realistic garment transfer across multiple individuals in a single scene. Code is available at https://github.com/Jerome-Young/OmniVTON
Problem

Research questions and friction points this paper is trying to address.

Lack of training-free universal virtual try-on solution
Difficulty in cross-domain generalization and data biases
Challenges in preserving garment details and pose alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free universal VTON framework
Garment prior generation and boundary stitching
DDIM inversion for precise pose alignment
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhaotong Yang
Ocean University of China
Yuhui Li
Yuhui Li
Peking University
LLM Inference AccelerationLLM Alignment
Shengfeng He
Shengfeng He
Singapore Management University
Visual ComputingGenerative ModelsComputer VisionComputational PhotographyComputer Graphics
X
Xinzhe Li
Ocean University of China
Y
Yangyang Xu
Harbin Institute of Technology (Shenzhen)
Junyu Dong
Junyu Dong
Ocean University of China
Y
Yong Du
Ocean University of China