DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This work addresses the degradation in reconstruction fidelity and drift commonly observed in existing training-free image editing methods, which stems from the use of mismatched noise latents during inversion. To fundamentally eliminate reconstruction errors, the authors propose DirectEdit—a novel approach that, for the first time, aligns the forward diffusion trajectory rather than correcting the inversion path. The method integrates attention-based feature injection with a multi-branch, mask-guided noise fusion mechanism, effectively balancing editing controllability and output fidelity without increasing the number of neural network evaluations. Extensive experiments demonstrate that DirectEdit significantly outperforms state-of-the-art methods across diverse editing tasks, achieving high-fidelity, efficient, and training-free image manipulation.
📝 Abstract
With recent advancements in large-scale pre-trained text-to-image (T2I) models, training-free image editing methods have demonstrated remarkable success. Typically, these methods involve adding noise to a clean image via an inversion process, followed by separate denoising steps for the reconstruction and editing paths during the forward process. However, since the reconstruction path is approximated using noisy latents from mismatched timesteps, existing methods inevitably suffer from accumulated drift, which fundamentally limits reconstruction fidelity. To address this challenge, we systematically analyze the inversion process within the flow transformer and propose DirectEdit, a simple yet effective editing method that eliminates the inherent reconstruction error without introducing additional neural function evaluations (NFEs). Unlike most prior works that attempt to rectify the inversion path, DirectEdit focuses on directly aligning the forward paths, enabling precise reconstruction and reliable feature sharing. Furthermore, we introduce a preservation mechanism based on attention feature injection and multi-branch mask-guided noise blending, which effectively balances fidelity and editability. Extensive experiments across diverse scenarios demonstrate that DirectEdit achieves efficient and accurate image editing, delivering superior performance that outperforms state-of-the-art methods. Code and examples are available at https://desongyang.github.io/Directedit.
Problem

Research questions and friction points this paper is trying to address.

image editing
reconstruction fidelity
inversion drift
flow-based models
text-to-image
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-based inversion
training-free editing
forward path alignment
attention feature injection
mask-guided noise blending
🔎 Similar Papers
No similar papers found.