The Devil is in Attention Sharing: Improving Complex Non-rigid Image Editing Faithfulness via Attention Synergy

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In diffusion-based zero-shot image editing, complex non-rigid edits (e.g., pose or deformation) often suffer from severe distortion due to “attention collapse” in existing attention-sharing mechanisms—where either positional or semantic features dominate, impairing faithful alignment. To address this, we propose SynPS (Synergistic Position-Semantic Attention), the first framework jointly modeling and dynamically reweighting positional embeddings and text-guided semantic features. Furthermore, we introduce a quantitative edit-strength metric and a dynamic modulation strategy during denoising to adaptively balance edit intensity, mitigating both over-editing and under-editing. Evaluated on multiple public benchmarks and newly constructed datasets, our method significantly improves edit fidelity. Qualitative and quantitative results consistently outperform state-of-the-art methods, especially in large-scale non-rigid edits, where it preserves structural coherence and fine-grained realism.

Technology Category

Application Category

📝 Abstract
Training-free image editing with large diffusion models has become practical, yet faithfully performing complex non-rigid edits (e.g., pose or shape changes) remains highly challenging. We identify a key underlying cause: attention collapse in existing attention sharing mechanisms, where either positional embeddings or semantic features dominate visual content retrieval, leading to over-editing or under-editing.To address this issue, we introduce SynPS, a method that Synergistically leverages Positional embeddings and Semantic information for faithful non-rigid image editing. We first propose an editing measurement that quantifies the required editing magnitude at each denoising step. Based on this measurement, we design an attention synergy pipeline that dynamically modulates the influence of positional embeddings, enabling SynPS to balance semantic modifications and fidelity preservation.By adaptively integrating positional and semantic cues, SynPS effectively avoids both over- and under-editing. Extensive experiments on public and newly curated benchmarks demonstrate the superior performance and faithfulness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Improves faithfulness in complex non-rigid image editing
Addresses attention collapse in attention sharing mechanisms
Balances semantic modifications and fidelity preservation adaptively
Innovation

Methods, ideas, or system contributions that make the work stand out.

SynPS method synergizes positional and semantic information
Dynamic modulation of positional embeddings for editing balance
Adaptive integration of cues to prevent over- and under-editing
🔎 Similar Papers
No similar papers found.
Z
Zhuo Chen
University of Electronic Science and Technology of China
Fanyue Wei
Fanyue Wei
National University of Singapore
R
Runze Xu
University of Electronic Science and Technology of China
J
Jingjing Li
University of Electronic Science and Technology of China
Lixin Duan
Lixin Duan
Data Intelligence Group (DIG) @ UESTC
Transfer LearningDomain Adaptation
Angela Yao
Angela Yao
National University of Singapore
computer visiondeep learningmachine learning
W
Wen Li
University of Electronic Science and Technology of China