🤖 AI Summary
This study addresses positivity violations arising from design-induced missingness in public health research—e.g., systolic blood pressure (SBP) measurements systematically absent for U.S. children aged 2–7 years in NHANES. We propose a Bayesian synthetic modeling framework that uniquely integrates a verifiable external physiological growth model into the statistical inference pipeline, jointly leveraging covariate-conditional extrapolation and complex survey weighting. Unlike conventional weighting or imputation methods, our approach avoids reliance on untestable assumptions. It enables unbiased estimation of causal parameters without requiring the positivity condition. Applied to estimating mean SBP among U.S. children and adolescents aged 2–17 years, the method yields 100.5 mmHg (95% CI: 99.9–101.0), significantly lower than estimates from complete-case analysis or purely statistical extrapolation. Diagnostic validation within the positivity region confirms high model reliability. The core contribution is the introduction of a novel “mathematical-model-driven causal inference” paradigm.
📝 Abstract
Introduction: Missing data is a challenge to medical research. Accounting for missing data by imputing or weighting conditional on covariates relies on the variable with missingness being observed at least some of the time for all unique covariate values. This requirement is referred to as positivity, and violations can result in bias. Here, we review a novel approach to addressing positivity violations in the context of systolic blood pressure. Methods: To illustrate the proposed approach, we estimate the mean systolic blood pressure among children and adolescents aged 2-17 years old in the United States using data from 2017-2018 National Health and Nutrition Examination Survey (NHANES). As blood pressure was never measured for those aged 2-7, there exists a positivity violation by design. Using a recently proposed synthesis of statistical and mathematical models, we integrate external information with NHANES to address our motivating question. Results: With the synthesis model, the estimated mean systolic blood pressure was 100.5 (95% confidence interval: 99.9, 101.0), which is notably lower than either a complete-case analysis or extrapolation from a statistical model. The synthesis results were supported by a diagnostic comparing the performance of the mathematical model in the positive region. Conclusion: Positivity violations pose a threat to quantitative medical research, and standard approaches to addressing nonpositivity rely on restrictive untestable assumptions. Using a synthesis model, like the one detailed here, offers a viable alternative through integration of external information.