🤖 AI Summary
Existing inversion-free image editing methods often suffer from structural degradation and quality loss due to their reliance on fixed Gaussian noise to construct source trajectories. This work proposes SNR-Edit, a training-free editing framework that introduces structure-aware adaptive noise control into inversion-free flow models for the first time. By injecting segmentation-based constraints into the initial noise, SNR-Edit anchors the stochastic components of the source trajectory to the implicit inversion location of the original image, effectively mitigating trajectory drift. The method achieves high-fidelity structural preservation on flow-based generative models such as SD3 and FLUX without requiring fine-tuning or explicit inversion. Experiments demonstrate that SNR-Edit significantly improves both pixel-level metrics and VLM-based scores on PIE-Bench and SNR-Bench, with only an additional computational overhead of approximately one second per image.
📝 Abstract
Inversion-free image editing using flow-based generative models challenges the prevailing inversion-based pipelines. However, existing approaches rely on fixed Gaussian noise to construct the source trajectory, leading to biased trajectory dynamics and causing structural degradation or quality loss. To address this, we introduce SNR-Edit, a training-free framework achieving faithful Latent Trajectory Correction via adaptive noise control. Mechanistically, SNR-Edit uses structure-aware noise rectification to inject segmentation constraints into the initial noise, anchoring the stochastic component of the source trajectory to the real image's implicit inversion position and reducing trajectory drift during source--target transport. This lightweight modification yields smoother latent trajectories and ensures high-fidelity structural preservation without requiring model tuning or inversion. Across SD3 and FLUX, evaluations on PIE-Bench and SNR-Bench show that SNR-Edit delivers performance on pixel-level metrics and VLM-based scoring, while adding only about 1s overhead per image.