🤖 AI Summary
This work addresses high-dimensional Poisson empirical Bayes (Poisson-EB) mean estimation under unknown priors, increasing dimensionality, and distributional shift—scenarios where classical methods (e.g., Robbins’ estimator, nonparametric maximum likelihood estimation, NPMLE) suffer performance degradation. We propose the first integration of pretrained Transformers into the classical EB framework, leveraging synthetic-data pretraining and in-context learning to adaptively infer unknown priors. Theoretically, we prove that wide Transformers uniformly approximate the oracle estimator achievable under known priors. Empirically, even a compact Transformer with only 100K parameters substantially outperforms NPMLE across real-world datasets—including NHL, MLB, and BookCorpusOpen—and out-of-distribution (OOD) settings, achieving both lower estimation error and higher computational efficiency. Mechanistic analysis reveals that the model learns a novel, nonlinear shrinkage strategy distinct from conventional EB approaches. Our method establishes a scalable, robust, and interpretable deep learning paradigm for empirical Bayes estimation.
📝 Abstract
This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting. In Poisson-EB a high-dimensional mean vector $ heta$ (with iid coordinates sampled from an unknown prior $pi$) is estimated on the basis of $X=mathrm{Poisson}( heta)$. A transformer model is pre-trained on a set of synthetically generated pairs $(X, heta)$ and learns to do in-context learning (ICL) by adapting to unknown $pi$. Theoretically, we show that a sufficiently wide transformer can achieve vanishing regret with respect to an oracle estimator who knows $pi$ as dimension grows to infinity. Practically, we discover that already very small models (100k parameters) are able to outperform the best classical algorithm (non-parametric maximum likelihood, or NPMLE) both in runtime and validation loss, which we compute on out-of-distribution synthetic data as well as real-world datasets (NHL hockey, MLB baseball, BookCorpusOpen). Finally, by using linear probes, we confirm that the transformer's EB estimator appears to internally work differently from either NPMLE or Robbins' estimators.