๐ค AI Summary
This work addresses optimal control and conditional generation for diffusion processes under singular terminal rewardsโe.g., infinite reward upon exact state targeting. Conventional methods, reliant on reward gradients, fail to handle such nonsmooth, singular rewards robustly. To overcome this, we propose a novel theoretical framework grounded in Malliavin calculus and integration-by-parts on path space, enabling the first noise-free, highly robust modeling of singular rewards. Our approach eliminates dependence on reward function differentiability, thereby supporting rigorous variational analysis and stable training. Experiments demonstrate significant improvements over state-of-the-art diffusion models in diffusion bridge construction, conditional classification, and zero-noise conditional generation. The framework establishes a new paradigm for stochastic control and generative modeling driven by singular rewards.
๐ Abstract
In stochastic optimal control and conditional generative modelling, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and path-space integration by parts, that enables the development of methods robust to such singular rewards. This allows our approach to handle a broad range of applications, including classification, diffusion bridges, and conditioning without the need for artificial observational noise. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques.