FusionEdit: Semantic Fusion and Attention Modulation for Training-Free Image Editing

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitations of existing text-guided image editing methods, which often suffer from boundary artifacts and reduced editability due to their reliance on hard masks. To overcome these issues, the authors propose a training-free editing framework that automatically partitions image regions into edit and preserve areas based on semantic differences. The approach introduces a soft mask generation mechanism and integrates it with a statistics-aware attention fusion strategy within the DiT attention layers. By combining distance-aware latent space blending, AdaIN modulation, and total variation loss, the method achieves high-fidelity, natural, and controllable edits while preserving identity consistency and global semantics. Extensive experiments demonstrate that the proposed technique significantly outperforms current state-of-the-art methods across multiple quantitative and qualitative metrics.

Technology Category

Application Category

📝 Abstract

Text-guided image editing aims to modify specific regions according to the target prompt while preserving the identity of the source image. Recent methods exploit explicit binary masks to constrain editing, but hard mask boundaries introduce artifacts and reduce editability. To address these issues, we propose FusionEdit, a training-free image editing framework that achieves precise and controllable edits. First, editing and preserved regions are automatically identified by measuring semantic discrepancies between source and target prompts. To mitigate boundary artifacts, FusionEdit performs distance-aware latent fusion along region boundaries to yield the soft and accurate mask, and employs a total variation loss to enforce smooth transitions, obtaining natural editing results. Second, FusionEdit leverages AdaIN-based modulation within DiT attention layers to perform a statistical attention fusion in the editing region, enhancing editability while preserving global consistency with the source image. Extensive experiments demonstrate that our FusionEdit significantly outperforms state-of-the-art methods. Code is available at \href{https://github.com/Yvan1001/FusionEdit}{https://github.com/Yvan1001/FusionEdit}.

Problem

Research questions and friction points this paper is trying to address.

text-guided image editing

mask artifacts

identity preservation

boundary artifacts

editability

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free image editing

semantic fusion

attention modulation