SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address global semantic inconsistency and visual disharmony in layout and content co-editing across structured (e.g., posters, web pages) and unstructured (e.g., natural images) domains, this paper proposes Reward-Refine—a test-time optimization method—and RewardDPO—a training-time preference alignment technique—within a multi-agent collaborative editing framework. Our approach is the first to jointly ensure structural integrity and semantic coherence under a unified architecture, achieved through reward-guided layout planning, preference-based reinforcement learning optimization, and generative-model-driven cross-domain collaborative decision-making. Evaluated on the SMARTEdit-Bench benchmark, our method significantly outperforms baselines including InstructPix2Pix and HIVE, achieving a 15% improvement in structured-domain performance. Both automated metrics and human evaluations confirm substantial gains in overall editing quality, fidelity, and semantic consistency.

Technology Category

Application Category

📝 Abstract
We present SMART-Editor, a framework for compositional layout and content editing across structured (posters, websites) and unstructured (natural images) domains. Unlike prior models that perform local edits, SMART-Editor preserves global coherence through two strategies: Reward-Refine, an inference-time rewardguided refinement method, and RewardDPO, a training-time preference optimization approach using reward-aligned layout pairs. To evaluate model performance, we introduce SMARTEdit-Bench, a benchmark covering multi-domain, cascading edit scenarios. SMART-Editor outperforms strong baselines like InstructPix2Pix and HIVE, with RewardDPO achieving up to 15% gains in structured settings and Reward-Refine showing advantages on natural images. Automatic and human evaluations confirm the value of reward-guided planning in producing semantically consistent and visually aligned edits.
Problem

Research questions and friction points this paper is trying to address.

Preserves global coherence in layout and content editing
Introduces a benchmark for multi-domain edit scenarios
Outperforms baselines in structured and unstructured domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework for human-like design editing
Reward-Refine method for global coherence
RewardDPO for training-time preference optimization
🔎 Similar Papers
No similar papers found.
Ishani Mondal
Ishani Mondal
PhD Student at the University of Maryland || Microsoft Research India || IIT Kharagpur
Multimodal+Multilingual ReasoningHuman-in-the-loop NLP
M
Meera Bharadwaj
University of Maryland, College Park
A
Ayush Roy
University of Maryland, College Park
Aparna Garimella
Aparna Garimella
Adobe Inc
Natural Language ProcessingComputational Social Science
J
Jordan Lee Boyd-Graber
University of Maryland, College Park