Universal priors: solving empirical Bayes via Bayesian inference and pretraining

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling models pretrained on synthetic data to effectively solve empirical Bayes problems under arbitrary, unknown test distributions. To this end, it introduces the notion of a “universal prior” and integrates it with a pretrained Transformer architecture to achieve adaptive inference across diverse test distributions. In the Poisson empirical Bayes setting, the method is theoretically shown to attain a near-optimal regret bound of $\widetilde{O}(1/n)$ uniformly over all test distributions. Furthermore, the paper elucidates the model’s ability to generalize beyond the training sequence length through the lens of posterior shrinkage and Bayesian inference, offering principled insights into its robust out-of-distribution performance.

Technology Category

Application Category

📝 Abstract
We theoretically justify the recent empirical finding of [Teh et al., 2025] that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the model architecture or training dynamics, we ask why a pretrained Bayes estimator, trained under a prespecified training distribution, can adapt to arbitrary test distributions. Focusing on Poisson EB problems, we identify the existence of universal priors such that training under these priors yields a near-optimal regret bound of $\widetilde{O}(\frac{1}{n})$ uniformly over all test distributions. Our analysis leverages the classical phenomenon of posterior contraction in Bayesian statistics, showing that the pretrained transformer adapts to unknown test distributions precisely through posterior contraction. This perspective also explains the phenomenon of length generalization, in which the test sequence length exceeds the training length, as the model performs Bayesian inference using a generalized posterior.
Problem

Research questions and friction points this paper is trying to address.

empirical Bayes
universal priors
posterior contraction
pretraining
distribution adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

universal priors
empirical Bayes
posterior contraction
length generalization
pretraining
🔎 Similar Papers
No similar papers found.
N
Nick Cannella
Courant Institute of Mathematical Sciences, NYU, New York City, NY
Anzo Teh
Anzo Teh
PhD student, MIT
Statisticsinformation theoryempirical Bayes.
Yanjun Han
Yanjun Han
Assistant Professor, New York University
statisticslearning theoryinformation theory
Y
Yury Polyanskiy
Department of EECS, MIT, Cambridge, MA