๐ค AI Summary
Conventional instrumental variable (IV) methods for binary outcomes often fail to satisfy the exclusion restriction and independence assumptions. Method: Under the stable confounding and stable treatment effect assumptions, we introduce the novel concept of a โquasi-instrumental variableโ (QIV)โa weaker construct requiring only predictive power for the outcome, not full IV validity. Leveraging a structural equilibrium dynamic generative model, we establish a nonparametric identification framework for both marginal and conditional average treatment effects (ATEs/CATEs). We further propose a generalized odds-product reparameterization and develop maximum likelihood and triple-robust semiparametric efficient estimators. Results: Simulation studies and empirical analysis using UK Biobank data demonstrate that the method achieves strong robustness and high estimation accuracy even in small samples and near-boundary settings, substantially broadening the applicability and reliability of causal inference for binary outcomes in observational studies.
๐ Abstract
Instrumental variable (IV) methods are central to causal inference from observational data, particularly when a randomized experiment is not feasible. However, of the three conventional core IV identification conditions, only one, IV relevance, is empirically verifiable; often one or both of the other conditions, exclusion restriction and IV independence from unmeasured confounders, are unmet in real-world applications. These challenges are compounded when the outcome is binary, a setting for which robust IV methods remain underdeveloped. A fundamental contribution of this paper is the development of a general identification strategy justified under a structural equilibrium dynamic generative model of so-called stable confounding and a quasi instrumental variable (QIV), i.e. a variable that is only assumed to be predictive of the outcome. Such a model implies (a) stability of confounding on the multiplicative scale, and (b) stability of the additive average treatment effect among the treated (ATT), across levels of that QIV. The former is all that is necessary to ensure a valid test of the causal null hypothesis; together those two conditions establish nonparametric identification and estimation of the conditional and marginal ATT. To address the statistical challenges posed by the need for boundedness in binary outcomes, we introduce a generalized odds product re-parametrization of the observed data distribution, and we develop both a principled maximum likelihood estimator and a triply robust semiparametric locally efficient estimator, which we evaluate through simulations and an empirical application to the UK Biobank.