Multi-turn Consistent Image Editing

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current single-step image editing methods struggle with ambiguous user intent, complex transformations, and scenarios requiring iterative refinement, often yielding inconsistent results. This paper introduces the first consistency-aware framework for multi-round interactive image editing. Our method addresses these challenges via four key innovations: (1) flow-matching-based precise image inversion to ensure high-fidelity initial edits; (2) an adaptive attention highlighting mechanism that dynamically localizes editable regions; (3) a dual-objective linear quadratic regulator (LQR)-guided stable sampling strategy that explicitly models and suppresses error accumulation across rounds; and (4) attention modulation informed by Transformer layer-role analysis to enhance cross-round semantic consistency. Extensive experiments demonstrate that our approach significantly outperforms single-step baselines in both multi-round editing success rate and visual fidelity, establishing a new paradigm for iterative, controllable, and highly consistent interactive image editing.

Technology Category

Application Category

📝 Abstract
Many real-world applications, such as interactive photo retouching, artistic content creation, and product design, require flexible and iterative image editing. However, existing image editing methods primarily focus on achieving the desired modifications in a single step, which often struggles with ambiguous user intent, complex transformations, or the need for progressive refinements. As a result, these methods frequently produce inconsistent outcomes or fail to meet user expectations. To address these challenges, we propose a multi-turn image editing framework that enables users to iteratively refine their edits, progressively achieving more satisfactory results. Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling, effectively mitigating error accumulation. Additionally, by analyzing the layer-wise roles of transformers, we introduce a adaptive attention highlighting method that enhances editability while preserving multi-turn coherence. Extensive experiments demonstrate that our framework significantly improves edit success rates and visual fidelity compared to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Achieving consistent multi-turn image editing iteratively
Mitigating error accumulation in complex image transformations
Enhancing editability while preserving multi-turn coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow matching for accurate image inversion
Dual-objective LQR for stable sampling
Adaptive attention highlighting for coherence
🔎 Similar Papers
No similar papers found.
Z
Zijun Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yingying Deng
Yingying Deng
University of Science and Technology Beijing
computer visionAIGC
X
Xiangyu He
Institute of Automation, Chinese Academy of Sciences, Beijing, China
W
Weiming Dong
Institute of Automation, Chinese Academy of Sciences, Beijing, China
F
Fan Tang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China