🤖 AI Summary
Bayesian model selection and averaging rely on the marginal likelihood, whose exact computation is intractable for complex models; standard estimators such as bridge sampling often yield high-variance approximations. This paper proposes a diagnostic, low-overhead framework to assess the reliability of marginal likelihood estimates: it introduces Pareto-$hat{k}$ diagnostics and block reordering into the bridge sampling pipeline, enabling robust quantification of Monte Carlo standard error (MCSE) without additional posterior sampling. The method integrates bridge sampling, MCSE estimation, and a dual-diagnostic mechanism. In simulation studies and real-world posterior distributions from posteriordb, it substantially reduces estimator variability and enhances credibility. The resulting tool provides a reproducible, verifiable, and practical solution for Bayesian model comparison.
📝 Abstract
In Bayesian statistics, the marginal likelihood is used for model selection and averaging, yet it is often challenging to compute accurately for complex models. Approaches such as bridge sampling, while effective, may suffer from issues of high variability of the estimates. We present how to estimate Monte Carlo standard error (MCSE) for bridge sampling, and how to diagnose the reliability of MCSE estimates using Pareto-$hat{k}$ and block reshuffling diagnostics without the need to repeatedly re-run full posterior inference. We demonstrate the behavior with increasingly more difficult simulated posteriors and many real posteriors from the posteriordb database.