General Bayesian Policy Learning

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work proposes a generalized Bayesian framework for policy learning that directly infers optimal decision rules—such as treatment assignments or portfolio allocations—from observational data, without explicitly modeling potential outcomes of individual actions. The approach reformulates empirical welfare maximization as a regularized squared-error minimization problem and constructs a posterior distribution over policies by introducing a Gaussian pseudo-likelihood, thereby unifying Bayesian inference with decision-theoretic principles. A neural network with tanh-compressed outputs serves as a tractable surrogate for the squared loss, enabling computationally feasible Bayesian inference over broad policy classes. The method further provides theoretical generalization guarantees through PAC-Bayes analysis, ensuring robust out-of-sample performance.

Technology Category

Application Category

📝 Abstract

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice and portfolio selection. In such problems, the statistical target is a decision rule, and the prediction of each outcome $Y(a)$ is not necessarily of primary interest. We formulate this policy learning problem by loss-based Bayesian updating. Our main technical device is a squared-loss surrogate for welfare maximization. We show that maximizing empirical welfare over a policy class is equivalent to minimizing a scaled squared error in the outcome difference, up to a quadratic regularization controlled by a tuning parameter $\zeta>0$. This rewriting yields a General Bayes posterior over decision rules that admits a Gaussian pseudo-likelihood interpretation. We clarify two Bayesian interpretations of the resulting generalized posterior, a working Gaussian view and a decision-theoretic loss-based view. As one implementation example, we introduce neural networks with tanh-squashed outputs. Finally, we provide theoretical guarantees in a PAC-Bayes style.

Problem

Research questions and friction points this paper is trying to address.

policy learning

Bayesian decision-making

welfare maximization

treatment choice

portfolio selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

General Bayes

policy learning

squared-loss surrogate