🤖 AI Summary
Does repeated Bayesian interim analysis compromise inferential reliability? This study systematically evaluates its impact on bias, mean squared error, credible interval coverage probability, false discovery rate, family-wise error rate, and statistical power. Through theoretical derivation and large-scale simulation—encompassing single-arm and two-arm randomized controlled trials, diverse endpoint types (e.g., binary, time-to-event), and varied prior specifications—we demonstrate that even with correctly specified priors, unadjusted repeated Bayesian analysis substantially distorts operating characteristics: it inflates type I error, reduces credible interval coverage, and induces non-negligible estimation bias. Our key contribution is the first rigorous quantification of the prevalence and severity of the Bayesian multiplicity problem. We establish that adaptive Bayesian designs must jointly incorporate multiplicity adjustment and careful prior elicitation. These findings provide critical methodological foundations for regulatory guidance on Bayesian clinical trials.
📝 Abstract
Interim analyses are commonly used in clinical trials to enable early stopping for efficacy, futility, or safety. While their impact on frequentist operating characteristics is well studied and broadly understood, the effect of repeated Bayesian interim analyses - when conducted without appropriate multiplicity adjustment - remains an area of active debate. In this article, we provide both theoretical justification and numerical evidence illustrating how such analyses affect key inferential properties, including bias, mean squared error, the coverage probability of posterior credible intervals, false discovery rate, familywise error rate, and power. Our findings demonstrate that Bayesian interim analyses can significantly alter a trial's operating characteristics, even when the prior used for Bayesian inference is correctly specified and aligned with the data-generating process. Extensive simulation studies, covering a variety of endpoints, trial designs (single-arm and two-arm randomized controlled trials), and scenarios with both correctly specified and misspecified priors, support theoretical insights. Collectively, these results underscore the necessity of appropriate adjustment, thoughtful prior specification, and comprehensive evaluation to ensure valid and reliable inference in Bayesian adaptive trial designs.