🤖 AI Summary
Existing reward alignment methods for generative models typically rely on multi-step stochastic trajectories or differentiable generators, limiting their applicability to deterministic or non-differentiable settings. This work proposes ZeNO, a framework that formulates noise optimization as a path integral control problem, enabling gradient-free optimization in the latent noise space using only zeroth-order reward feedback. ZeNO is the first method to achieve fully gradient-free reward alignment, thereby eliminating the requirement for differentiability of either the generator or the reward function. It introduces an efficient noise update mechanism by integrating the Ornstein–Uhlenbeck process with Langevin dynamics. Experiments demonstrate that ZeNO performs robustly across diverse generative models and reward functions, and it successfully extends to protein structure generation—a task where backpropagation is infeasible—highlighting its generality and effectiveness.
📝 Abstract
Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings. To address this, here we present ZeNO (Zeroth-order Noise Optimization), a gradient-free framework that formulates noise optimization as a path-integral control problem, estimable from zeroth-order reward evaluations alone. When instantiated with an Ornstein--Uhlenbeck reference process, the update connects to Langevin dynamics implicitly targeting a reward-tilted distribution. ZeNO enables effective inference-time scaling and demonstrates strong performance across diverse generators and reward functions, including a protein structure generation task where backpropagation is infeasible.