Accounting for Missing Data in Public Health Research Using a Synthesis of Statistical and Mathematical Models

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses positivity violations arising from design-induced missingness in public health research—e.g., systolic blood pressure (SBP) measurements systematically absent for U.S. children aged 2–7 years in NHANES. We propose a Bayesian synthetic modeling framework that uniquely integrates a verifiable external physiological growth model into the statistical inference pipeline, jointly leveraging covariate-conditional extrapolation and complex survey weighting. Unlike conventional weighting or imputation methods, our approach avoids reliance on untestable assumptions. It enables unbiased estimation of causal parameters without requiring the positivity condition. Applied to estimating mean SBP among U.S. children and adolescents aged 2–17 years, the method yields 100.5 mmHg (95% CI: 99.9–101.0), significantly lower than estimates from complete-case analysis or purely statistical extrapolation. Diagnostic validation within the positivity region confirms high model reliability. The core contribution is the introduction of a novel “mathematical-model-driven causal inference” paradigm.

Technology Category

Application Category

📝 Abstract
Introduction: Missing data is a challenge to medical research. Accounting for missing data by imputing or weighting conditional on covariates relies on the variable with missingness being observed at least some of the time for all unique covariate values. This requirement is referred to as positivity, and violations can result in bias. Here, we review a novel approach to addressing positivity violations in the context of systolic blood pressure. Methods: To illustrate the proposed approach, we estimate the mean systolic blood pressure among children and adolescents aged 2-17 years old in the United States using data from 2017-2018 National Health and Nutrition Examination Survey (NHANES). As blood pressure was never measured for those aged 2-7, there exists a positivity violation by design. Using a recently proposed synthesis of statistical and mathematical models, we integrate external information with NHANES to address our motivating question. Results: With the synthesis model, the estimated mean systolic blood pressure was 100.5 (95% confidence interval: 99.9, 101.0), which is notably lower than either a complete-case analysis or extrapolation from a statistical model. The synthesis results were supported by a diagnostic comparing the performance of the mathematical model in the positive region. Conclusion: Positivity violations pose a threat to quantitative medical research, and standard approaches to addressing nonpositivity rely on restrictive untestable assumptions. Using a synthesis model, like the one detailed here, offers a viable alternative through integration of external information.
Problem

Research questions and friction points this paper is trying to address.

Addressing missing data in public health research
Overcoming positivity violations in statistical models
Estimating systolic blood pressure with incomplete data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesis of statistical and mathematical models
Integration of external information with NHANES
Addressing positivity violations in blood pressure data
🔎 Similar Papers
No similar papers found.
P
Paul N Zivich
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
B
Bonnie E Shook-Sa
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Stephen R Cole
Stephen R Cole
Professor of Epidemiology, UNC Chapel Hill
EpidemiologyStatisticsCausality
Eric T Lofgren
Eric T Lofgren
Washington State University
School for Global Health
Jessie K Edwards
Jessie K Edwards
Department of Epidemiology, University of North Carolina, Chapel Hill
epidemiologycausal inferenceinfectious diseasesHIV