A Black-Box Debiasing Framework for Conditional Sampling

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This paper addresses posterior sampling bias in conditional generative models arising from randomness in finite training data $D = {X_i}_{i=1}^n$: standard algorithms—such as Bayesian inference or conditional generative modeling—produce an approximated posterior $f(hat{pi}_{X^n})$ whose expectation over data deviates from the true $f(pi_X)$. To resolve this, we propose the first black-box debiasing framework that achieves a $k$-th-order unbiased approximation to the true posterior $P_{X|Y=y^*}$, without altering model architecture or inflating estimator variance. Our method fuses multiple independently sampled distributions via carefully designed weights, grounded in empirical prior interpolation, bounded likelihood, and smoothness assumptions—balancing memorization and generalization. Experiments demonstrate substantial improvements in conditional sampling accuracy across diverse settings, with applicability to any positive integer $k$.

Technology Category

Application Category

📝 Abstract
Conditional sampling is a fundamental task in Bayesian statistics and generative modeling. Consider the problem of sampling from the posterior distribution $P_{X|Y=y^*}$ for some observation $y^*$, where the likelihood $P_{Y|X}$ is known, and we are given $n$ i.i.d. samples $D={X_i}_{i=1}^n$ drawn from an unknown prior distribution $π_X$. Suppose that $f(hatπ_{X^n})$ is the distribution of a posterior sample generated by an algorithm (e.g. a conditional generative model or the Bayes rule) when $hatπ_{X^n}$ is the empirical distribution of the training data. Although averaging over the randomness of the training data $D$, we have $mathbb{E}_Dleft(hatπ_{X^n} ight)= π_X$, we do not have $mathbb{E}_Dleft{f(hatπ_{X^n}) ight}= f(π_X)$ due to the nonlinearity of $f$, leading to a bias. In this paper we propose a black-box debiasing scheme that improves the accuracy of such a naive plug-in approach. For any integer $k$ and under boundedness of the likelihood and smoothness of $f$, we generate samples $hat{X}^{(1)},dots,hat{X}^{(k)}$ and weights $w_1,dots,w_k$ such that $sum_{i=1}^kw_iP_{hat{X}^{(i)}}$ is a $k$-th order approximation of $f(π_X)$, where the generation process treats $f$ as a black-box. Our generation process achieves higher accuracy when averaged over the randomness of the training data, without degrading the variance, which can be interpreted as improving memorization without compromising generalization in generative models.
Problem

Research questions and friction points this paper is trying to address.

Debiasing conditional sampling from posterior distributions
Correcting bias in black-box generative model outputs
Improving accuracy of plug-in Bayesian estimation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box debiasing framework for conditional sampling bias
Generates weighted samples for higher-order bias correction
Improves memorization without compromising generalization accuracy