Adaptive sequential Monte Carlo for automated cross validation in structural Bayesian hierarchical models

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structured cross-validation (CV) for Bayesian hierarchical models faces two key challenges: standard importance sampling fails due to drastic posterior geometry changes upon data removal, and repeated full MCMC runs incur prohibitive computational cost. This paper introduces an approximate CV framework based on adaptive sequential Monte Carlo (SMC), the first to integrate adaptive SMC into structured CV. It dynamically constructs bridging distributions and selectively intervenes in parallel MCMC chains to progressively approximate the posterior conditioned on held-out data. The method supports diverse holdout schemes—including leave-group-out, grouped K-fold, and sequential one-step-ahead validation—without re-running full MCMC. Experiments across three real-world tasks demonstrate substantial gains in computational efficiency and robustness, while maintaining generality and full automation. Our approach establishes a reliable, scalable new paradigm for structured Bayesian CV.

Technology Category

Application Category

📝 Abstract
Importance sampling (IS) is widely used for approximate Bayesian cross validation (CV) due to its efficiency, requiring only the re-weighting of a single set of posterior draws. With structural Bayesian hierarchical models, vanilla IS can produce unreliable results, as out-of-sample replication may involve non-standard case-deletion schemes which significantly alter the posterior geometry. This inevitably necessitates computationally expensive re-runs of Markov chain Monte Carlo (MCMC), making structural CV impracticable. To address this challenge, we consider sampling from a sequence of posteriors leading to the case-deleted posterior(s) via adaptive sequential Monte Carlo (SMC). We design the sampler to (a) support a broad range of structural CV schemes, (b) enhance efficiency by adaptively selecting Markov kernels, intervening in parallelizable MCMC re-runs only when necessary, and (c) streamline the workflow by automating the design of intermediate bridging distributions. Its practical utility is demonstrated through three real-world applications involving three types of predictive model assessments: leave-group-out CV, group $K$-fold CV, and sequential one-step-ahead validation.
Problem

Research questions and friction points this paper is trying to address.

Unreliable importance sampling in hierarchical Bayesian models
Costly MCMC re-runs for structured cross-validation scenarios
Need for automated path construction in posterior sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive sequential Monte Carlo for structured CV
Automated path construction minimizes MCMC re-runs
Supports diverse cross-validation designs efficiently
🔎 Similar Papers
No similar papers found.