🤖 AI Summary
Diffusion models face fundamental challenges in unsupervised posterior inference—e.g., zero-shot compositional modeling and offline RL policy optimization—when the prior is a diffusion process and constraints are black-box, rendering conventional posterior sampling intractable. To address this, we propose the Relative Trajectory Balance (RTB) objective, the first to provably guarantee asymptotically correct, data-agnostic learning of the constrained posterior. RTB enables unbiased, scalable sampling from the true posterior under arbitrary black-box constraints. Our method unifies generative flow networks with deep reinforcement learning, supporting both discrete and continuous diffusion dynamics, score matching, and behavioral prior encoding. Evaluated across classifier-guided generation, discrete language infilling, text-to-image synthesis, and offline RL, RTB achieves state-of-the-art performance, significantly improving both posterior sampling accuracy and mode coverage diversity.
📝 Abstract
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $mathbf{x}sim p^{
m post}(mathbf{x})propto p(mathbf{x})r(mathbf{x})$, in a model that consists of a diffusion generative model prior $p(mathbf{x})$ and a black-box constraint or likelihood function $r(mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.