EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

High-quality instruction-image pairs for garment editing are scarce, and multimodal large language models (MLLMs) suffer from insufficient fashion-domain supervision. Method: We introduce EditGarment—the first instruction-tuned dataset for standalone garment editing—comprising six practical, design-process-aligned instruction categories; a semantic-aware Fashion Edit Score (FES) metric modeling inter-attribute dependencies among garment properties; and an MLLM-based automated pipeline for synthesizing instruction-image triplets, refined via FES-guided high-precision filtering. Contribution/Results: From 52,257 candidate triplets, we curate 20,596 high-fidelity instruction-image-mask triplets. EditGarment substantially enhances data availability and model generalizability for instruction-driven garment editing, establishing a new benchmark and technical foundation for AI-driven fashion research.

Technology Category

Application Category

📝 Abstract

Instruction-based garment editing enables precise image modifications via natural language, with broad applications in fashion design and customization. Unlike general editing tasks, it requires understanding garment-specific semantics and attribute dependencies. However, progress is limited by the scarcity of high-quality instruction-image pairs, as manual annotation is costly and hard to scale. While MLLMs have shown promise in automated data synthesis, their application to garment editing is constrained by imprecise instruction modeling and a lack of fashion-specific supervisory signals. To address these challenges, we present an automated pipeline for constructing a garment editing dataset. We first define six editing instruction categories aligned with real-world fashion workflows to guide the generation of balanced and diverse instruction-image triplets. Second, we introduce Fashion Edit Score, a semantic-aware evaluation metric that captures semantic dependencies between garment attributes and provides reliable supervision during construction. Using this pipeline, we construct a total of 52,257 candidate triplets and retain 20,596 high-quality triplets to build EditGarment, the first instruction-based dataset tailored to standalone garment editing. The project page is https://yindq99.github.io/EditGarment-project/.

Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality instruction-image pairs for garment editing

Imprecise instruction modeling in automated data synthesis for fashion

Absence of semantic-aware evaluation metrics for garment attribute dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for garment dataset construction

Fashion Edit Score for semantic-aware evaluation

Instruction categories aligned with fashion workflows

🔎 Similar Papers

General-purpose Clothes Manipulation with Semantic Keypoints

2024-08-15arXiv.orgCitations: 1