Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work proposes DOIT, a novel approach for adapting diffusion models to arbitrary reward functions without requiring additional training or imposing strong assumptions such as differentiability of the reward. By introducing Doob’s $h$-transform into the diffusion framework, DOIT dynamically corrects the sampling process during inference, enabling adaptation to non-differentiable rewards in a theoretically principled manner. Leveraging measure transport theory, the method is supported by a rigorous theoretical foundation that provides high-probability convergence guarantees. Empirical evaluations on the D4RL offline reinforcement learning benchmark demonstrate that DOIT significantly outperforms existing methods while maintaining efficient sampling speed.

Technology Category

Application Category

📝 Abstract

Adaptation methods have been a workhorse for unlocking the transformative power of pre-trained diffusion models in diverse applications. Existing approaches often abstract adaptation objectives as a reward function and steer diffusion models to generate high-reward samples. However, these approaches can incur high computational overhead due to additional training, or rely on stringent assumptions on the reward such as differentiability. Moreover, despite their empirical success, theoretical justification and guarantees are seldom established. In this paper, we propose DOIT (Doob-Oriented Inference-time Transformation), a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards. The key framework underlying our method is a measure transport formulation that seeks to transport the pre-trained generative distribution to a high-reward target distribution. We leverage Doob's $h$-transform to realize this transport, which induces a dynamic correction to the diffusion sampling process and enables efficient simulation-based computation without modifying the pre-trained model. Theoretically, we establish a high probability convergence guarantee to the target high-reward distribution via characterizing the approximation error in the dynamic Doob's correction. Empirically, on D4RL offline RL benchmarks, our method consistently outperforms state-of-the-art baselines while preserving sampling efficiency.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

model adaptation

reward function

training-free

Doob's h-transform

Innovation

Methods, ideas, or system contributions that make the work stand out.

Doob's h-transform

training-free adaptation

diffusion models