SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address global semantic inconsistency and visual disharmony in layout and content co-editing across structured (e.g., posters, web pages) and unstructured (e.g., natural images) domains, this paper proposes Reward-Refine—a test-time optimization method—and RewardDPO—a training-time preference alignment technique—within a multi-agent collaborative editing framework. Our approach is the first to jointly ensure structural integrity and semantic coherence under a unified architecture, achieved through reward-guided layout planning, preference-based reinforcement learning optimization, and generative-model-driven cross-domain collaborative decision-making. Evaluated on the SMARTEdit-Bench benchmark, our method significantly outperforms baselines including InstructPix2Pix and HIVE, achieving a 15% improvement in structured-domain performance. Both automated metrics and human evaluations confirm substantial gains in overall editing quality, fidelity, and semantic consistency.

Technology Category

Application Category

📝 Abstract

We present SMART-Editor, a framework for compositional layout and content editing across structured (posters, websites) and unstructured (natural images) domains. Unlike prior models that perform local edits, SMART-Editor preserves global coherence through two strategies: Reward-Refine, an inference-time rewardguided refinement method, and RewardDPO, a training-time preference optimization approach using reward-aligned layout pairs. To evaluate model performance, we introduce SMARTEdit-Bench, a benchmark covering multi-domain, cascading edit scenarios. SMART-Editor outperforms strong baselines like InstructPix2Pix and HIVE, with RewardDPO achieving up to 15% gains in structured settings and Reward-Refine showing advantages on natural images. Automatic and human evaluations confirm the value of reward-guided planning in producing semantically consistent and visually aligned edits.

Problem

Research questions and friction points this paper is trying to address.

Preserves global coherence in layout and content editing

Introduces a benchmark for multi-domain edit scenarios

Outperforms baselines in structured and unstructured domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework for human-like design editing

Reward-Refine method for global coherence

RewardDPO for training-time preference optimization

🔎 Similar Papers

No similar papers found.