Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing virtual try-on methods struggle to handle realistic, full-outfit scenarios involving multiple garments, fine-grained categories, layered combinations, and diverse styles. To address this gap, this work introduces the first large-scale multimodal dataset specifically designed for outfit-level virtual try-on, encompassing 40 major categories and over 300 fine-grained subcategories, with 80,000 outfit triplets. Each triplet includes 3–12 reference garment images, a corresponding in-the-wild model photograph, and detailed textual annotations. High-fidelity data are generated through a hybrid pipeline combining heuristic styling rules, image synthesis, automated filtering, and manual validation. The dataset is benchmarked using state-of-the-art models, revealing persistent challenges in garment layering, style consistency, spatial alignment, and artifact generation—highlighting the complexity and research significance of full-outfit virtual try-on.

Technology Category

Application Category

📝 Abstract
Virtual try-on (VTON) has advanced single-garment visualization, yet real-world fashion centers on full outfits with multiple garments, accessories, fine-grained categories, layering, and diverse styling, remaining beyond current VTON systems. Existing datasets are category-limited and lack outfit diversity. We introduce Garments2Look, the first large-scale multimodal dataset for outfit-level VTON, comprising 80K many-garments-to-one-look pairs across 40 major categories and 300+ fine-grained subcategories. Each pair includes an outfit with 3-12 reference garment images (Average 4.48), a model image wearing the outfit, and detailed item and try-on textual annotations. To balance authenticity and diversity, we propose a synthesis pipeline. It involves heuristically constructing outfit lists before generating try-on results, with the entire process subjected to strict automated filtering and human validation to ensure data quality. To probe task difficulty, we adapt SOTA VTON methods and general-purpose image editing models to establish baselines. Results show current methods struggle to try on complete outfits seamlessly and to infer correct layering and styling, leading to misalignment and artifacts.
Problem

Research questions and friction points this paper is trying to address.

virtual try-on
outfit-level
multi-garment
fashion dataset
clothing layering
Innovation

Methods, ideas, or system contributions that make the work stand out.

outfit-level virtual try-on
multi-reference dataset
garment layering
multimodal fashion synthesis
high-fidelity VTON
🔎 Similar Papers
No similar papers found.