🤖 AI Summary
This paper addresses the challenge of simultaneously achieving high editing speed, content fidelity, and instruction adherence in text-guided image editing. We propose an efficient editing framework built upon Rectified Flow (RF). Our key contributions are: (1) a piecewise skip-trajectory design coupled with a dedicated reverse process—PerRFI—enabling high-fidelity editing in ≤5 denoising steps; (2) latent variable inversion injection and decoupled prompt guidance to enhance semantic controllability and precision; and (3) integration of a Canny-conditioned ControlNet to impose structural priors, effectively suppressing artifacts and improving detail consistency. Experiments on the PIE benchmark demonstrate that our method outperforms existing state-of-the-art approaches both qualitatively and quantitatively, while drastically reducing inference steps. It achieves a favorable trade-off among editing efficiency, visual fidelity, and accurate semantic alignment with user instructions.
📝 Abstract
We propose a fast text-guided image editing method called InstantEdit based on the RectifiedFlow framework, which is structured as a few-step editing process that preserves critical content while following closely to textual instructions. Our approach leverages the straight sampling trajectories of RectifiedFlow by introducing a specialized inversion strategy called PerRFI. To maintain consistent while editable results for RectifiedFlow model, we further propose a novel regeneration method, Inversion Latent Injection, which effectively reuses latent information obtained during inversion to facilitate more coherent and detailed regeneration. Additionally, we propose a Disentangled Prompt Guidance technique to balance editability with detail preservation, and integrate a Canny-conditioned ControlNet to incorporate structural cues and suppress artifacts. Evaluation on the PIE image editing dataset demonstrates that InstantEdit is not only fast but also achieves better qualitative and quantitative results compared to state-of-the-art few-step editing methods.