Adaptive sequential Monte Carlo for automated cross validation in structural Bayesian hierarchical models

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Structured cross-validation (CV) for Bayesian hierarchical models faces two key challenges: standard importance sampling fails due to drastic posterior geometry changes upon data removal, and repeated full MCMC runs incur prohibitive computational cost. This paper introduces an approximate CV framework based on adaptive sequential Monte Carlo (SMC), the first to integrate adaptive SMC into structured CV. It dynamically constructs bridging distributions and selectively intervenes in parallel MCMC chains to progressively approximate the posterior conditioned on held-out data. The method supports diverse holdout schemes—including leave-group-out, grouped K-fold, and sequential one-step-ahead validation—without re-running full MCMC. Experiments across three real-world tasks demonstrate substantial gains in computational efficiency and robustness, while maintaining generality and full automation. Our approach establishes a reliable, scalable new paradigm for structured Bayesian CV.

Technology Category

Application Category

📝 Abstract

Importance sampling (IS) is widely used for approximate Bayesian cross validation (CV) due to its efficiency, requiring only the re-weighting of a single set of posterior draws. With structural Bayesian hierarchical models, vanilla IS can produce unreliable results, as out-of-sample replication may involve non-standard case-deletion schemes which significantly alter the posterior geometry. This inevitably necessitates computationally expensive re-runs of Markov chain Monte Carlo (MCMC), making structural CV impracticable. To address this challenge, we consider sampling from a sequence of posteriors leading to the case-deleted posterior(s) via adaptive sequential Monte Carlo (SMC). We design the sampler to (a) support a broad range of structural CV schemes, (b) enhance efficiency by adaptively selecting Markov kernels, intervening in parallelizable MCMC re-runs only when necessary, and (c) streamline the workflow by automating the design of intermediate bridging distributions. Its practical utility is demonstrated through three real-world applications involving three types of predictive model assessments: leave-group-out CV, group $K$-fold CV, and sequential one-step-ahead validation.

Problem

Research questions and friction points this paper is trying to address.

Unreliable importance sampling in hierarchical Bayesian models

Costly MCMC re-runs for structured cross-validation scenarios

Need for automated path construction in posterior sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive sequential Monte Carlo for structured CV

Automated path construction minimizes MCMC re-runs

Supports diverse cross-validation designs efficiently

🔎 Similar Papers

Amortized Bayesian Multilevel Models