🤖 AI Summary
High-quality instruction-image pairs for garment editing are scarce, and multimodal large language models (MLLMs) suffer from insufficient fashion-domain supervision. Method: We introduce EditGarment—the first instruction-tuned dataset for standalone garment editing—comprising six practical, design-process-aligned instruction categories; a semantic-aware Fashion Edit Score (FES) metric modeling inter-attribute dependencies among garment properties; and an MLLM-based automated pipeline for synthesizing instruction-image triplets, refined via FES-guided high-precision filtering. Contribution/Results: From 52,257 candidate triplets, we curate 20,596 high-fidelity instruction-image-mask triplets. EditGarment substantially enhances data availability and model generalizability for instruction-driven garment editing, establishing a new benchmark and technical foundation for AI-driven fashion research.
📝 Abstract
Instruction-based garment editing enables precise image modifications via natural language, with broad applications in fashion design and customization. Unlike general editing tasks, it requires understanding garment-specific semantics and attribute dependencies. However, progress is limited by the scarcity of high-quality instruction-image pairs, as manual annotation is costly and hard to scale. While MLLMs have shown promise in automated data synthesis, their application to garment editing is constrained by imprecise instruction modeling and a lack of fashion-specific supervisory signals. To address these challenges, we present an automated pipeline for constructing a garment editing dataset. We first define six editing instruction categories aligned with real-world fashion workflows to guide the generation of balanced and diverse instruction-image triplets. Second, we introduce Fashion Edit Score, a semantic-aware evaluation metric that captures semantic dependencies between garment attributes and provides reliable supervision during construction. Using this pipeline, we construct a total of 52,257 candidate triplets and retain 20,596 high-quality triplets to build EditGarment, the first instruction-based dataset tailored to standalone garment editing. The project page is https://yindq99.github.io/EditGarment-project/.