🤖 AI Summary
This work proposes a fully automated, human-in-the-loop-free approach to constructing and iteratively refining Bayesian models. Leveraging a command-line coding agent, the system autonomously writes Stan models, performs MCMC sampling, and decides whether to accept proposed modifications based on out-of-sample negative log predictive density (NLPD) and diagnostic metrics—including divergences, R-hat, and effective sample size. Remarkably, this method generates diverse, interpretable, and high-performing Bayesian models without relying on search algorithms, external evaluators, or domain-specific instructions. Evaluated across five datasets, it successfully discovers sophisticated structures such as robust regression, heteroscedasticity, contaminated mixtures, hierarchical partial pooling, correlated random effects, and Poisson attack-defense formulations, achieving performance comparable to or exceeding that of black-box models like TabPFN.
📝 Abstract
We present AutoStan, a framework in which a command-line interface (CLI) coding agent autonomously builds and iteratively improves Bayesian models written in Stan. The agent operates in a loop, writing a Stan model file, executing MCMC sampling, then deciding whether to keep or revert each change based on two complementary feedback signals: the negative log predictive density (NLPD) on held-out data and the sampler's own diagnostics (divergences, R-hat, effective sample size). We evaluate AutoStan on five datasets with diverse modeling structures. On a synthetic regression dataset with outliers, the agent progresses from naive linear regression to a model with Student-t robustness, nonlinear heteroscedastic structure, and an explicit contamination mixture, matching or outperforming TabPFN, a state-of-the-art black-box method, while remaining fully interpretable. Across four additional experiments, the same mechanism discovers hierarchical partial pooling, varying-slope models with correlated random effects, and a Poisson attack/defense model for soccer. No search algorithm, critic module, or domain-specific instructions are needed. This is, to our knowledge, the first demonstration that a CLI coding agent can autonomously write and iteratively improve Stan code for diverse Bayesian modeling problems.