Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the inefficiency of conditional posterior inference—i.e., sampling from (p(x mid y))—in generative models. We propose a novel diffusion-sampling paradigm operating in the noise latent space (z) rather than the data space (x). Our core innovation is “outsourcing”: shifting complex posterior inference to a smoother Gaussian noise space, and jointly optimizing the diffusion process and an implicit reparameterization mapping (f_ heta(z)) via policy gradient methods (PPO/TRPO). The resulting framework is architecture-agnostic—compatible with pre-trained GANs, VAEs, and normalizing flows—and plug-and-play, requiring no modification to existing unconditional generators. It enables end-to-end amortized inference without task-specific retraining. Experiments on image conditional generation, human-feedback-driven RL fine-tuning, and protein structure modeling demonstrate substantial improvements over both amortized and non-amortized baselines.

Technology Category

Application Category

📝 Abstract

Any well-behaved generative model over a variable $mathbf{x}$ can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable $mathbf{z}$: $mathbf{x}=f_ heta(mathbf{z})$. In such a model (e.g., a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $mathbf{x} sim p_ heta(mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(mathbf{x}midmathbf{y}) propto p_ heta(mathbf{x})r(mathbf{x},mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($mathbf{z}$). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples $f_ heta(mathbf{z})$ are distributed according to the posterior in the data space ($mathbf{x}$). For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably both with current amortized and non-amortized inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.

Problem

Research questions and friction points this paper is trying to address.

Efficient posterior sampling in generative models

Amortized inference using diffusion models

Conditional sampling with GANs, VAEs, and flow-based models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion models in noise space

Reinforcement learning for sampling

Amortized posterior inference

🔎 Similar Papers

Conditional sampling within generative diffusion models