Multi-turn Consistent Image Editing

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Current single-step image editing methods struggle with ambiguous user intent, complex transformations, and scenarios requiring iterative refinement, often yielding inconsistent results. This paper introduces the first consistency-aware framework for multi-round interactive image editing. Our method addresses these challenges via four key innovations: (1) flow-matching-based precise image inversion to ensure high-fidelity initial edits; (2) an adaptive attention highlighting mechanism that dynamically localizes editable regions; (3) a dual-objective linear quadratic regulator (LQR)-guided stable sampling strategy that explicitly models and suppresses error accumulation across rounds; and (4) attention modulation informed by Transformer layer-role analysis to enhance cross-round semantic consistency. Extensive experiments demonstrate that our approach significantly outperforms single-step baselines in both multi-round editing success rate and visual fidelity, establishing a new paradigm for iterative, controllable, and highly consistent interactive image editing.

Technology Category

Application Category

📝 Abstract

Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the layer-wise roles of transformers, we introduce a adaptive attention highlighting method that enhances editability while preserving multi-turn coherence. Extensive experiments demonstrate that our framework significantly improves edit success rates and visual fidelity compared to existing methods.

Problem

Research questions and friction points this paper is trying to address.

Achieving consistent multi-turn image editing iteratively

Mitigating error accumulation in complex image transformations

Enhancing editability while preserving multi-turn coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow matching for accurate image inversion

Dual-objective LQR for stable sampling

Adaptive attention highlighting for coherence

🔎 Similar Papers

No similar papers found.