🤖 AI Summary
Under model misspecification, Bayesian model comparison (BMC) with neural surrogate models yields distorted results. To address this, we propose a calibration method based on a consistency loss: during simulation-based training, unlabeled real-world data is used to enforce probabilistic consistency of the surrogate’s outputs under data perturbations, thereby enhancing robustness to distributional shift in the true data-generating process. The method requires no access to the true likelihood and is compatible with standard BMC estimators such as bridge sampling. Experiments show that, when analytical likelihoods are available, the proposed loss significantly improves both calibration and ranking reliability of Bayesian evidence estimates; gains are marginal under purely neural likelihoods, confirming its critical utility under approximately accurate likelihood approximations. Our key contribution is the first integration of the self-consistency principle into amortized BMC surrogate training—enhancing trustworthy inference under model misspecification.
📝 Abstract
Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the reliability of neural surrogates deteriorates when simulation models are misspecified - the very case where model comparison is most needed. Thus, we supplement simulation-based training with a self-consistency (SC) loss on unlabeled real data to improve BMC estimates under empirical distribution shifts. Using a numerical experiment and two case studies with real data, we compare amortized evidence estimates with and without SC against analytic or bridge sampling benchmarks. SC improves calibration under model misspecification when having access to analytic likelihoods. However, it offers limited gains with neural surrogate likelihoods, making it most practical for trustworthy BMC when likelihoods are exact.