🤖 AI Summary
This study investigates why doubly robust (DR) estimators often underperform inverse probability weighting (IPW) or outcome-modeling-only approaches in finite samples with limited covariate overlap. Through theoretical analysis and extensive Monte Carlo simulations, the authors systematically evaluate the bias and variance of DR estimators across varying degrees of overlap and model specifications. They find that misspecification of the outcome model detrimentally affects DR performance far more severely than misspecification of the propensity score model, and this effect intensifies as overlap diminishes. Under poor overlap, DR estimators frequently exhibit inflated bias or variance due to extreme weights. The study thus recommends first assessing the degree of overlap, restricting inference to subpopulations with adequate overlap when necessary, and employing practical refinements such as trimming or overlap-weighting to improve estimator performance.
📝 Abstract
The doubly-robust (DR) estimator is popular for evaluating causal effects in observational studies and is often perceived as more desirable than inverse probability weighting (IPW) or outcome modeling alone because it provides extra protection against model misspecification. However, double robustness is an asymptotic property that may not hold in finite samples. We investigate how the finite sample performance of the DR estimator depends on the degree of covariate overlap between comparison groups. Using analytical illustrations and extensive simulations under various scenarios with different degrees of covariate overlap and model specifications, we examine the bias and variance of the DR estimator relative to IPW and outcome modeling estimators. We find that: (i) specification of the outcome model has a stronger influence on the DR estimates than specification of the propensity score model, and this dominance increases as overlap decreases; (ii) with poor overlap, the DR estimator generally amplifies the adverse consequences of extreme weights (large bias and/or variance) regardless of model specifications, and is often inferior to both the IPW and outcome modeling estimators. As a practical guide, we recommend always first checking the degree of overlap in applications. In the case of poor overlap, analysts should consider shifting the target population to a subpopulation with adequate overlap via methods such as trimming or overlap weighting.