CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-category virtual try-on faces two core challenges: garment-body size mismatch and regional functional ambiguity, leading to low visual realism and poor deformation robustness. To address these, we propose a novel three-region disentanglement framework grounded in human-like logical reasoning. Our method introduces the first “try-on/reconstruction/imagining” tri-regional prior guidance mechanism and an iterative cross-category data constructor to enable fine-grained semantic alignment and generalizable inference. It integrates region-aware generative modeling, conditional tri-regional mask prediction, iterative adversarial data augmentation, and a multi-stage collaborative synthesis network. Evaluated on cross-category virtual try-on benchmarks, our approach achieves state-of-the-art performance, significantly improving deformation robustness and texture fidelity. Both quantitative metrics and qualitative visual results consistently outperform existing methods.

Technology Category

Application Category

📝 Abstract
Despite remarkable progress in image-based virtual try-on systems, generating realistic and robust fitting images for cross-category virtual try-on remains a challenging task. The primary difficulty arises from the absence of human-like reasoning, which involves addressing size mismatches between garments and models while recognizing and leveraging the distinct functionalities of various regions within the model images. To address this issue, we draw inspiration from human cognitive processes and disentangle the complex reasoning required for cross-category try-on into a structured framework. This framework systematically decomposes the model image into three distinct regions: try-on, reconstruction, and imagination zones. Each zone plays a specific role in accommodating the garment and facilitating realistic synthesis. To endow the model with robust reasoning capabilities for cross-category scenarios, we propose an iterative data constructor. This constructor encompasses diverse scenarios, including intra-category try-on, any-to-dress transformations (replacing any garment category with a dress), and dress-to-any transformations (replacing a dress with another garment category). Utilizing the generated dataset, we introduce a tri-zone priors generator that intelligently predicts the try-on, reconstruction, and imagination zones by analyzing how the input garment is expected to align with the model image. Guided by these tri-zone priors, our proposed method, CrossVTON, achieves state-of-the-art performance, surpassing existing baselines in both qualitative and quantitative evaluations. Notably, it demonstrates superior capability in handling cross-category virtual try-on, meeting the complex demands of real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing size mismatches in cross-category virtual try-on.
Developing human-like reasoning for garment and model alignment.
Enhancing realism and robustness in virtual try-on systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-zone priors generator
Iterative data constructor
Cross-category virtual try-on