π€ AI Summary
In text-guided image editing, conventional diffusion- or rectified flow (RF)-based inversion methods suffer from accumulated bias due to iterative noise estimation, leading to reconstruction distortion. To address this, we propose the Direct Noise Alignment (DNA) paradigm: explicitly aligning Gaussian noise distributions in the noise domain to avoid drift inherent in RF inversion; introducing a Motion Velocity Guidance (MVG) mechanism to jointly optimize edit controllability and background fidelity; and establishing DNA-Benchβthe first benchmark tailored for long-prompt editing. Our method integrates noise-domain velocity field modeling, prompt-conditioned velocity guidance, and latent-space interpolation optimization. Extensive evaluation on DNA-Bench and multiple quantitative metrics demonstrates significant improvements over state-of-the-art methods, achieving superior performance in edit accuracy, structural fidelity, and prompt adherence. Code and the benchmark are publicly released.
π Abstract
Leveraging the powerful generation capability of large-scale pretrained text-to-image models, training-free methods have demonstrated impressive image editing results. Conventional diffusion-based methods, as well as recent rectified flow (RF)-based methods, typically reverse synthesis trajectories by gradually adding noise to clean images, during which the noisy latent at the current timestep is used to approximate that at the next timesteps, introducing accumulated drift and degrading reconstruction accuracy. Considering the fact that in RF the noisy latent is estimated through direct interpolation between Gaussian noises and clean images at each timestep, we propose Direct Noise Alignment (DNA), which directly refines the desired Gaussian noise in the noise domain, significantly reducing the error accumulation in previous methods. Specifically, DNA estimates the velocity field of the interpolated noised latent at each timestep and adjusts the Gaussian noise by computing the difference between the predicted and expected velocity field. We validate the effectiveness of DNA and reveal its relationship with existing RF-based inversion methods. Additionally, we introduce a Mobile Velocity Guidance (MVG) to control the target prompt-guided generation process, balancing image background preservation and target object editability. DNA and MVG collectively constitute our proposed method, namely DNAEdit. Finally, we introduce DNA-Bench, a long-prompt benchmark, to evaluate the performance of advanced image editing models. Experimental results demonstrate that our DNAEdit achieves superior performance to state-of-the-art text-guided editing methods. Codes and benchmark will be available at href{ https://xiechenxi99.github.io/DNAEdit/}{https://xiechenxi99.github.io/DNAEdit/}.