🤖 AI Summary
Existing single-step text-to-image editing methods often suffer from object distortion and loss of consistency in non-edited regions due to high energy consumption and unstable vector arithmetic. This work formulates image editing as a dynamic optimal transport problem between distributions defined by source and target text prompts, introducing a universal, training-free, and inversion-free framework. By constructing a low-energy, smooth, and variance-reduced editing field, the method enables large-step integration in a single pass, achieving high-fidelity and lightweight one-step editing. It significantly enhances editing stability and accuracy while preserving consistency in unmodified regions, outperforming current training-free single-step approaches.
📝 Abstract
The advent of one-step text-to-image (T2I) models offers unprecedented synthesis speed. However, their application to text-guided image editing remains severely hampered, as forcing existing training-free editors into a single inference step fails. This failure manifests as severe object distortion and a critical loss of consistency in non-edited regions, resulting from the high-energy, erratic trajectories produced by naive vector arithmetic on the models' structured fields. To address this problem, we introduce ChordEdit, a model agnostic, training-free, and inversion-free method that facilitates high-fidelity one-step editing. We recast editing as a transport problem between the source and target distributions defined by the source and target text prompts. Leveraging dynamic optimal transport theory, we derive a principled, low-energy control strategy. This strategy yields a smoothed, variance-reduced editing field that is inherently stable, facilitating the field to be traversed in a single, large integration step. A theoretically grounded and experimentally validated approach allows ChordEdit to deliver fast, lightweight and precise edits, finally achieving true real-time editing on these challenging models.