A Black-Box Debiasing Framework for Conditional Sampling

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

This paper addresses posterior sampling bias in conditional generative models arising from randomness in finite training data $D = {X_i}_{i=1}^n$: standard algorithms—such as Bayesian inference or conditional generative modeling—produce an approximated posterior $f(hat{pi}_{X^n})$ whose expectation over data deviates from the true $f(pi_X)$. To resolve this, we propose the first black-box debiasing framework that achieves a $k$-th-order unbiased approximation to the true posterior $P_{X|Y=y^*}$, without altering model architecture or inflating estimator variance. Our method fuses multiple independently sampled distributions via carefully designed weights, grounded in empirical prior interpolation, bounded likelihood, and smoothness assumptions—balancing memorization and generalization. Experiments demonstrate substantial improvements in conditional sampling accuracy across diverse settings, with applicability to any positive integer $k$.

Technology Category

Application Category

📝 Abstract

Conditional sampling is a fundamental task in Bayesian statistics and generative modeling. Consider the problem of sampling from the posterior distribution $P_{X|Y=y^*}$ for some observation $y^*$, where the likelihood $P_{Y|X}$ is known, and we are given $n$ i.i.d. samples $D={X_i}_{i=1}^n$ drawn from an unknown prior distribution $π_X$. Suppose that $f(hatπ_{X^n})$ is the distribution of a posterior sample generated by an algorithm (e.g. a conditional generative model or the Bayes rule) when $hatπ_{X^n}$ is the empirical distribution of the training data. Although averaging over the randomness of the training data $D$, we have $mathbb{E}_Dleft(hatπ_{X^n} ight)= π_X$, we do not have $mathbb{E}_Dleft{f(hatπ_{X^n}) ight}= f(π_X)$ due to the nonlinearity of $f$, leading to a bias. In this paper we propose a black-box debiasing scheme that improves the accuracy of such a naive plug-in approach. For any integer $k$ and under boundedness of the likelihood and smoothness of $f$, we generate samples $hat{X}^{(1)},dots,hat{X}^{(k)}$ and weights $w_1,dots,w_k$ such that $sum_{i=1}^kw_iP_{hat{X}^{(i)}}$ is a $k$-th order approximation of $f(π_X)$, where the generation process treats $f$ as a black-box. Our generation process achieves higher accuracy when averaged over the randomness of the training data, without degrading the variance, which can be interpreted as improving memorization without compromising generalization in generative models.

Problem

Research questions and friction points this paper is trying to address.

Debiasing conditional sampling from posterior distributions

Correcting bias in black-box generative model outputs

Improving accuracy of plug-in Bayesian estimation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box debiasing framework for conditional sampling bias

Generates weighted samples for higher-order bias correction

Improves memorization without compromising generalization accuracy

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings