MV-Fashion: Towards Enabling Virtual Try-On and Size Estimation with Multi-View Paired Data

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 4D human datasets commonly lack realistic garment dynamics or fine-grained annotations and paired data tailored for virtual try-on and size estimation in fashion research. To address this gap, this work introduces MV-Fashion, a large-scale multi-view video dataset comprising 80 subjects, 3,273 sequences (totaling 72.5 million frames), captured with synchronized multi-layer clothing configurations alongside corresponding flat-lay product images. The dataset provides pixel-level semantic segmentation, fabric elasticity attributes, and 3D point clouds. MV-Fashion is the first to enable, under real-world conditions, high-fidelity multi-view, multi-layer outfit capture with precise annotations and aligned flat-to-worn image pairs, thereby supporting tasks such as virtual try-on, size estimation, and novel view synthesis, for which it also establishes baseline benchmarks.

Technology Category

Application Category

📝 Abstract
Existing 4D human datasets fall short for fashion-specific research, lacking either realistic garment dynamics or task-specific annotations. Synthetic datasets suffer from a realism gap, whereas real-world captures lack the detailed annotations and paired data required for virtual try-on (VTON) and size estimation tasks. To bridge this gap, we introduce MV-Fashion, a large-scale, multi-view video dataset engineered for domain-specific fashion analysis. MV-Fashion features 3,273 sequences (72.5 million frames) from 80 diverse subjects wearing 3-10 outfits each. It is designed to capture complex, real-world garment dynamics, including multiple layers and varied styling (e.g. rolled sleeves, tucked shirt). A core contribution is a rich data representation that includes pixel-level semantic annotations, ground-truth material properties like elasticity, and 3D point clouds. Crucially for VTON applications, MV-Fashion provides paired data: multi-view synchronized captures of worn garments alongside their corresponding flat, catalogue images. We leverage this dataset to establish baselines for fashion-centric tasks, including virtual try-on, clothing size estimation, and novel view synthesis. The dataset is available at https://hunorlaczko.github.io/MV-Fashion .
Problem

Research questions and friction points this paper is trying to address.

virtual try-on
size estimation
fashion dataset
garment dynamics
paired data
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view paired data
virtual try-on
garment dynamics
size estimation
fashion dataset
🔎 Similar Papers
No similar papers found.
H
Hunor Laczkó
Universitat Autònoma de Barcelona, Computer Vision Center, Universitat de Barcelona
L
Libang Jia
Computer Vision Center, Universitat de Barcelona
L
Loc-Phat Truong
Computer Vision Center, Universitat de Barcelona
D
Diego Hernández
Computer Vision Center, Universitat de Barcelona
Sergio Escalera
Sergio Escalera
Prof., ICREA Academy, University of Barcelona, Computer Vision Center, ELLIS & IAPR & AAIA Fellow
Human Behavior AnalysisMachine LearningComputer VisionAffective ComputingSocial Signal Processing
J
Jordi Gonzalez
Universitat Autònoma de Barcelona, Computer Vision Center
Meysam Madadi
Meysam Madadi
PostDoc Researcher, University of Barcelona
Computer Vision3D