OmniTry: Virtual Try-On Anything without Masks

๐Ÿ“… 2025-08-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses key limitations of existing virtual try-on (VTON) methodsโ€”namely, narrow applicability to clothing only, reliance on manually annotated masks, and poor generalization across object categories. We propose the first mask-agnostic, universal virtual try-on framework capable of synthesizing realistic try-on results for clothing, jewelry, and twelve accessory categories. Methodologically, we introduce a two-stage training paradigm: first, pretraining a reprogramming-based image inpainting model on large-scale unpaired data to achieve robust object localization; second, fine-tuning with minimal paired data to jointly optimize identity preservation and spatial accuracy. Crucially, our framework requires no input masks, significantly improving both localization accuracy and identity fidelity. Extensive evaluation on a newly constructed multi-category benchmark demonstrates consistent superiority over state-of-the-art methods, validating its effectiveness in real-world and in-store applications.

Technology Category

Application Category

๐Ÿ“ Abstract
Virtual Try-ON (VTON) is a practical and widely-applied task, for which most of existing works focus on clothes. This paper presents OmniTry, a unified framework that extends VTON beyond garment to encompass any wearable objects, e.g., jewelries and accessories, with mask-free setting for more practical application. When extending to various types of objects, data curation is challenging for obtaining paired images, i.e., the object image and the corresponding try-on result. To tackle this problem, we propose a two-staged pipeline: For the first stage, we leverage large-scale unpaired images, i.e., portraits with any wearable items, to train the model for mask-free localization. Specifically, we repurpose the inpainting model to automatically draw objects in suitable positions given an empty mask. For the second stage, the model is further fine-tuned with paired images to transfer the consistency of object appearance. We observed that the model after the first stage shows quick convergence even with few paired samples. OmniTry is evaluated on a comprehensive benchmark consisting of 12 common classes of wearable objects, with both in-shop and in-the-wild images. Experimental results suggest that OmniTry shows better performance on both object localization and ID-preservation compared with existing methods. The code, model weights, and evaluation benchmark of OmniTry will be made publicly available at https://omnitry.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Extends virtual try-on to any wearable objects without masks
Solves data scarcity for paired training images across categories
Improves object localization and identity preservation in try-on
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-staged pipeline for mask-free localization
Repurposed inpainting model for object positioning
Fine-tuning with paired images for appearance consistency
๐Ÿ”Ž Similar Papers
No similar papers found.
Yutong Feng
Yutong Feng
Alibaba Tongyi Lab | Tsinghua University
Generative AIComputer Vision
L
Linlin Zhang
Zhejiang University
H
Hengyuan Cao
Zhejiang University
Y
Yiming Chen
Kunbyte AI
X
Xiaoduan Feng
Kunbyte AI
J
Jian Cao
Kunbyte AI
Y
Yuxiong Wu
Kunbyte AI
B
Bin Wang
Kunbyte AI