đ¤ AI Summary
This work addresses the inefficiency of conditional posterior inferenceâi.e., sampling from (p(x mid y))âin generative models. We propose a novel diffusion-sampling paradigm operating in the noise latent space (z) rather than the data space (x). Our core innovation is âoutsourcingâ: shifting complex posterior inference to a smoother Gaussian noise space, and jointly optimizing the diffusion process and an implicit reparameterization mapping (f_ heta(z)) via policy gradient methods (PPO/TRPO). The resulting framework is architecture-agnosticâcompatible with pre-trained GANs, VAEs, and normalizing flowsâand plug-and-play, requiring no modification to existing unconditional generators. It enables end-to-end amortized inference without task-specific retraining. Experiments on image conditional generation, human-feedback-driven RL fine-tuning, and protein structure modeling demonstrate substantial improvements over both amortized and non-amortized baselines.
đ Abstract
Any well-behaved generative model over a variable $mathbf{x}$ can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable $mathbf{z}$: $mathbf{x}=f_ heta(mathbf{z})$. In such a model (e.g., a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $mathbf{x} sim p_ heta(mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(mathbf{x}midmathbf{y}) propto p_ heta(mathbf{x})r(mathbf{x},mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($mathbf{z}$). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples $f_ heta(mathbf{z})$ are distributed according to the posterior in the data space ($mathbf{x}$). For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably both with current amortized and non-amortized inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.