FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing

πŸ“… 2025-11-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing flow-based, inversion-free text-guided image editing methods struggle to preserve the structural integrity of source image backgrounds, often causing spatial distortions and over-editing. To address this, we propose FIA-Editβ€”a computationally efficient, inversion-free editing framework built upon a novel frequency-domain interactive attention mechanism. This mechanism is the first to exchange frequency components across source and target features within self-attention layers and incorporates an explicit feature injection module to fuse source-domain semantic information. Operating entirely within the diffusion model’s forward process, FIA-Edit achieves cross-domain alignment and structural fidelity without latent inversion. On 512Γ—512 images, it completes a single edit in approximately six seconds. Quantitative and qualitative evaluations demonstrate superior performance over state-of-the-art methods in visual quality, background preservation, and text alignment accuracy. Notably, FIA-Edit is the first such method successfully applied to clinical surgical image enhancement, significantly improving model performance in hemorrhage classification tasks.

Technology Category

Application Category

πŸ“ Abstract
Text-guided image editing has advanced rapidly with the rise of diffusion models. While flow-based inversion-free methods offer high efficiency by avoiding latent inversion, they often fail to effectively integrate source information, leading to poor background preservation, spatial inconsistencies, and over-editing due to the lack of effective integration of source information. In this paper, we present FIA-Edit, a novel inversion-free framework that achieves high-fidelity and semantically precise edits through a Frequency-Interactive Attention. Specifically, we design two key components: (1) a Frequency Representation Interaction (FRI) module that enhances cross-domain alignment by exchanging frequency components between source and target features within self-attention, and (2) a Feature Injection (FIJ) module that explicitly incorporates source-side queries, keys, values, and text embeddings into the target branch's cross-attention to preserve structure and semantics. Comprehensive and extensive experiments demonstrate that FIA-Edit supports high-fidelity editing at low computational cost (~6s per 512 * 512 image on an RTX 4090) and consistently outperforms existing methods across diverse tasks in visual quality, background fidelity, and controllability. Furthermore, we are the first to extend text-guided image editing to clinical applications. By synthesizing anatomically coherent hemorrhage variations in surgical images, FIA-Edit opens new opportunities for medical data augmentation and delivers significant gains in downstream bleeding classification. Our project is available at: https://github.com/kk42yy/FIA-Edit.
Problem

Research questions and friction points this paper is trying to address.

Improves background preservation in inversion-free image editing
Addresses spatial inconsistencies during text-guided image manipulation
Reduces over-editing by better integrating source information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Interactive Attention enables inversion-free editing
Frequency Representation Interaction enhances cross-domain alignment
Feature Injection module preserves structure and semantics
πŸ”Ž Similar Papers
K
Kaixiang Yang
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Boyang Shen
Boyang Shen
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
X
Xin Li
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Yuchen Dai
Yuchen Dai
College of Life Science and Technology, Huazhong University of Science and Technology
Yuxuan Luo
Yuxuan Luo
City University of Hong Kong
Few shot learningZero shot learningContinual learning
Y
Yueran Ma
College of Life Science and Technology, Huazhong University of Science and Technology
W
Wei Fang
Wuhan United Imaging Healthcare Surgical Technology Co., Ltd
Q
Qiang Li
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology
Z
Zhiwei Wang
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology