π€ AI Summary
This study addresses the inadequacy of conventional sample size calculations in observational studies, which neglect the uncertainty introduced by propensity score estimation and inverse probability of treatment weighting (IPTW), leading to underestimated variance and miscalibrated statistical power. The authors propose a prospective sample size determination framework aligned with IPTW estimators, integrating the propensity score model and marginal structural model into a unified estimating system via generalized estimating equations (GEE) and stacked M-estimation. This approach explicitly propagates nuisance parameter uncertainty and directly models the large-sample variance of the IPTW estimator. For the first time, it ensures consistency between sample size planning and the asymptotic variance of IPTW estimators. Leveraging pilot data for variance factor estimation and a bootstrap stabilization procedure that accounts for both internal and external variability, the method accommodates diverse outcome types. Simulations demonstrate substantially improved power calibration accuracy over traditional randomized trialβbased formulas under challenging scenarios such as unstable weights, sparse outcomes, or heavy-tailed outcome distributions.
π Abstract
In observational studies, accurately characterizing variance is critical for sample size determination, yet unaccounted-for variability from propensity score estimation and the resulting weights limit the accuracy of standard variance approximations for design. Existing approaches often rely on heuristics or randomized controlled trial (RCT) formulas that treat weights as fixed, potentially misaligning prospective design with the causal estimator used at analysis. We propose an estimator-aligned framework for prospective sample size determination based on generalized estimating equations (GEE) and stacked M-estimation. By merging the propensity score model and marginal structural model (MSM) into a single system of estimating equations, the method propagates nuisance-model uncertainty and directly targets the large-sample variance of the IPTW estimator. For study planning, we estimate a pilot-based large-sample variance factor and introduce a bootstrap stabilization procedure that accounts for both within- and between-pilot variability. The framework applies uniformly across binary, count, and continuous outcomes through link-specific GEE representations under a common design principle. Simulation studies motivated by post-marketing safety and healthcare cost applications demonstrate that anchoring design to this variance improves power calibration relative to conventional RCT-style formulas, particularly in settings with weight instability, outcome sparsity, or heavy-tailed variability.