Post-selection inference for causal effects after causal discovery

📅 2024-05-10
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Direct effect estimation on a selected causal graph induces selection bias due to data reuse, invalidating confidence intervals. Method: We propose the first post-selection inference framework for fixed-population causal effect parameters, integrating resampling with graph-structure screening to depart from the conventional “select-then-infer” paradigm. Built upon the PC algorithm, our approach unifies conditional independence testing, Gaussian modeling, and joint estimation over multiple candidate graphs, and is modularly extensible to other causal discovery algorithms and distribution families. Contribution/Results: We establish asymptotic validity—specifically, asymptotically exact coverage—for confidence sets targeting the true causal effect. Empirical evaluations demonstrate that our method substantially improves reliability and robustness of causal inference under uncertainty, yielding well-calibrated confidence sets even after graph selection.

Technology Category

Application Category

📝 Abstract
Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables act as confounders versus mediators. However, naively using the data twice, for model selection and estimation, would lead to invalid confidence intervals. Moreover, if the selected graph is incorrect, the inferential claims may apply to a selected functional that is distinct from the actual causal effect. We propose an approach to post-selection inference that is based on a resampling and screening procedure, which essentially performs causal discovery multiple times with randomly varying intermediate test statistics. Then, an estimate of the target causal effect and corresponding confidence sets are constructed from a union of individual graph-based estimates and intervals. We show that this construction has asymptotically correct coverage for the true causal effect parameter. Importantly, the guarantee holds for a fixed population-level effect, not a data-dependent or selection-dependent quantity. Most of our exposition focuses on the PC-algorithm for learning directed acyclic graphs and the multivariate Gaussian case for simplicity, but the approach is general and modular, so it may be used with other conditional independence based discovery algorithms and distributional families.
Problem

Research questions and friction points this paper is trying to address.

Addresses invalid confidence intervals from data reuse
Ensures correct coverage for true causal effect parameter
Generalizes approach across discovery algorithms and distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Resampling and screening for causal discovery
Union of graph-based estimates and intervals
Asymptotically correct coverage guarantee