VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing virtual try-on evaluation suffers from three key limitations: misalignment between quantitative metrics and human perception, overreliance on indoor-scene test sets, and absence of a systematic, real-world benchmark. To address these, we propose VTBench—the first comprehensive virtual try-on benchmark designed for real-world scenarios. Our approach introduces a hierarchical, decoupled evaluation framework that quantifies performance across five dimensions: image fidelity, texture preservation, background consistency under complex scenes, cross-category size adaptation, and hand-occlusion handling. Crucially, we incorporate the first large-scale human preference annotations to bridge objective metrics with subjective perception. Methodologically, we construct a multi-granularity real-world test set and conduct cross-scenario analysis, revealing a significant performance gap between indoor and real-world settings. The full benchmark—including data, evaluation protocols, generated outputs, and human annotations—is publicly released, substantially enhancing evaluation authenticity, interpretability, and practical guidance.

Technology Category

Application Category

📝 Abstract
While virtual try-on has achieved significant progress, evaluating these models towards real-world scenarios remains a challenge. A comprehensive benchmark is essential for three key reasons:(1) Current metrics inadequately reflect human perception, particularly in unpaired try-on settings;(2)Most existing test sets are limited to indoor scenarios, lacking complexity for real-world evaluation; and (3) An ideal system should guide future advancements in virtual try-on generation. To address these needs, we introduce VTBench, a hierarchical benchmark suite that systematically decomposes virtual image try-on into hierarchical, disentangled dimensions, each equipped with tailored test sets and evaluation criteria. VTBench exhibits three key advantages:1) Multi-Dimensional Evaluation Framework: The benchmark encompasses five critical dimensions for virtual try-on generation (e.g., overall image quality, texture preservation, complex background consistency, cross-category size adaptability, and hand-occlusion handling). Granular evaluation metrics of corresponding test sets pinpoint model capabilities and limitations across diverse, challenging scenarios.2) Human Alignment: Human preference annotations are provided for each test set, ensuring the benchmark's alignment with perceptual quality across all evaluation dimensions. (3) Valuable Insights: Beyond standard indoor settings, we analyze model performance variations across dimensions and investigate the disparity between indoor and real-world try-on scenarios. To foster the field of virtual try-on towards challenging real-world scenario, VTBench will be open-sourced, including all test sets, evaluation protocols, generated results, and human annotations.
Problem

Research questions and friction points this paper is trying to address.

Current metrics fail to reflect human perception in virtual try-on
Existing test sets lack complexity for real-world evaluation scenarios
Need for a benchmark to guide future virtual try-on advancements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical benchmark suite for virtual try-on
Multi-dimensional evaluation framework with human alignment
Open-sourced test sets and evaluation protocols
🔎 Similar Papers
No similar papers found.