🤖 AI Summary
This work addresses reward-guided image editing without model training. We propose a training-free framework grounded in optimal control theory: the reverse process of diffusion or flow-matching models is formulated as a controllable trajectory starting from the source image, and editing is cast as an optimal control problem over this trajectory. To guide editing while avoiding reward hacking, we introduce adjoint-state-based iterative optimization of the trajectory, enabling gradient-driven updates without backpropagation through the generative model. Our key contribution is the first integration of the adjoint method with the reverse process of generative models, explicitly decoupling semantic fidelity preservation from reward maximization. Experiments demonstrate that our approach significantly outperforms existing training-free baselines across diverse editing tasks, achieving superior trade-offs between reward maximization and source-image fidelity.
📝 Abstract
Recent advancements in diffusion and flow-matching models have demonstrated remarkable capabilities in high-fidelity image synthesis. A prominent line of research involves reward-guided guidance, which steers the generation process during inference to align with specific objectives. However, leveraging this reward-guided approach to the task of image editing, which requires preserving the semantic content of the source image while enhancing a target reward, is largely unexplored. In this work, we introduce a novel framework for training-free, reward-guided image editing. We formulate the editing process as a trajectory optimal control problem where the reverse process of a diffusion model is treated as a controllable trajectory originating from the source image, and the adjoint states are iteratively updated to steer the editing process. Through extensive experiments across distinct editing tasks, we demonstrate that our approach significantly outperforms existing inversion-based training-free guidance baselines, achieving a superior balance between reward maximization and fidelity to the source image without reward hacking.