EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality instruction-image pairs for garment editing are scarce, and multimodal large language models (MLLMs) suffer from insufficient fashion-domain supervision. Method: We introduce EditGarment—the first instruction-tuned dataset for standalone garment editing—comprising six practical, design-process-aligned instruction categories; a semantic-aware Fashion Edit Score (FES) metric modeling inter-attribute dependencies among garment properties; and an MLLM-based automated pipeline for synthesizing instruction-image triplets, refined via FES-guided high-precision filtering. Contribution/Results: From 52,257 candidate triplets, we curate 20,596 high-fidelity instruction-image-mask triplets. EditGarment substantially enhances data availability and model generalizability for instruction-driven garment editing, establishing a new benchmark and technical foundation for AI-driven fashion research.

Technology Category

Application Category

📝 Abstract
Instruction-based garment editing enables precise image modifications via natural language, with broad applications in fashion design and customization. Unlike general editing tasks, it requires understanding garment-specific semantics and attribute dependencies. However, progress is limited by the scarcity of high-quality instruction-image pairs, as manual annotation is costly and hard to scale. While MLLMs have shown promise in automated data synthesis, their application to garment editing is constrained by imprecise instruction modeling and a lack of fashion-specific supervisory signals. To address these challenges, we present an automated pipeline for constructing a garment editing dataset. We first define six editing instruction categories aligned with real-world fashion workflows to guide the generation of balanced and diverse instruction-image triplets. Second, we introduce Fashion Edit Score, a semantic-aware evaluation metric that captures semantic dependencies between garment attributes and provides reliable supervision during construction. Using this pipeline, we construct a total of 52,257 candidate triplets and retain 20,596 high-quality triplets to build EditGarment, the first instruction-based dataset tailored to standalone garment editing. The project page is https://yindq99.github.io/EditGarment-project/.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality instruction-image pairs for garment editing
Imprecise instruction modeling in automated data synthesis for fashion
Absence of semantic-aware evaluation metrics for garment attribute dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for garment dataset construction
Fashion Edit Score for semantic-aware evaluation
Instruction categories aligned with fashion workflows
🔎 Similar Papers
No similar papers found.
D
Deqiang Yin
Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
Junyi Guo
Junyi Guo
Cornell University
Huanda Lu
Huanda Lu
NingboTech University
AI
F
Fangyu Wu
Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China
D
Dongming Lu
Zhejiang University, Hangzhou, Zhejiang, China