FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing flow-based, inversion-free text-guided image editing methods struggle to preserve the structural integrity of source image backgrounds, often causing spatial distortions and over-editing. To address this, we propose FIA-Edit—a computationally efficient, inversion-free editing framework built upon a novel frequency-domain interactive attention mechanism. This mechanism is the first to exchange frequency components across source and target features within self-attention layers and incorporates an explicit feature injection module to fuse source-domain semantic information. Operating entirely within the diffusion model’s forward process, FIA-Edit achieves cross-domain alignment and structural fidelity without latent inversion. On 512×512 images, it completes a single edit in approximately six seconds. Quantitative and qualitative evaluations demonstrate superior performance over state-of-the-art methods in visual quality, background preservation, and text alignment accuracy. Notably, FIA-Edit is the first such method successfully applied to clinical surgical image enhancement, significantly improving model performance in hemorrhage classification tasks.

Technology Category

Application Category

📝 Abstract

Text-guided image editing has advanced rapidly with the rise of diffusion models. While flow-based inversion-free methods offer high efficiency by avoiding latent inversion, they often fail to effectively integrate source information, leading to poor background preservation, spatial inconsistencies, and over-editing due to the lack of effective integration of source information. In this paper, we present FIA-Edit, a novel inversion-free framework that achieves high-fidelity and semantically precise edits through a Frequency-Interactive Attention. Specifically, we design two key components: (1) a Frequency Representation Interaction (FRI) module that enhances cross-domain alignment by exchanging frequency components between source and target features within self-attention, and (2) a Feature Injection (FIJ) module that explicitly incorporates source-side queries, keys, values, and text embeddings into the target branch's cross-attention to preserve structure and semantics. Comprehensive and extensive experiments demonstrate that FIA-Edit supports high-fidelity editing at low computational cost (~6s per 512 * 512 image on an RTX 4090) and consistently outperforms existing methods across diverse tasks in visual quality, background fidelity, and controllability. Furthermore, we are the first to extend text-guided image editing to clinical applications. By synthesizing anatomically coherent hemorrhage variations in surgical images, FIA-Edit opens new opportunities for medical data augmentation and delivers significant gains in downstream bleeding classification. Our project is available at: https://github.com/kk42yy/FIA-Edit.

Problem

Research questions and friction points this paper is trying to address.

Improves background preservation in inversion-free image editing

Addresses spatial inconsistencies during text-guided image manipulation

Reduces over-editing by better integrating source information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Interactive Attention enables inversion-free editing

Frequency Representation Interaction enhances cross-domain alignment

Feature Injection module preserves structure and semantics

🔎 Similar Papers

TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer