🤖 AI Summary
This study addresses the risk of downward bias in population prevalence estimates from non-random samples when using covariate adjustment. Applying a Bayesian hierarchical model to estimate SARS-CoV-2 seroprevalence in Australian urban populations—and complementing it with simulation studies—the authors demonstrate that in high-dimensional adjustment models, weakly regularizing priors intensify partial pooling effects, which, through a feedback mechanism involving test specificity, systematically deflate prevalence estimates. The work elucidates how the interplay between model dimensionality and prior strength induces such bias, proposes both short-term modeling corrections and long-term methodological improvements, and offers a diagnostic framework for identifying anomalous results in complex adjustment models.
📝 Abstract
When estimating population prevalence from a non-random sample, it is important to adjust for differences between sample and population. However, adjustment for multiple factors requires analysis that can be difficult to understand and validate. In this manuscript, we explore an unexpected downward trend of estimates when covariates are added sequentially to a Bayesian hierarchical model for the estimation of the prevalence of SARS-CoV-2 specific antibodies in an Australian city in late 2020.
We compare our data analysis to results from a simulation study to understand four potential contributors to this effect: (i) correction for differences between sample and population, (ii) rare-events bias in logistic regression, (iii) inclusion of the uncertainty of test sensitivity and specificity in a multilevel model, and (iv) increasing model dimensionality. We find that weak prior distributions on the logistic regression coefficients lead to a systematic increase in the amount of partial pooling across adjustment cells-the prior becomes stronger as model dimensionality increases-which in turn feeds through to the estimated assay specificity, which then feeds back to the model and results in lowering the estimated prevalence.
Our paper contributes three elements: (i) immediate and longer-term recommendations for using these types of models, (ii) simulation studies to explore the impact of the contributors to this effect, and (iii) a worked example of investigation of unexpected results in a model with multiple adjustment factors.