๐ค AI Summary
In high-dimensional settings with unknown causal graphs, existing methods struggle to simultaneously achieve computational efficiency and statistical optimality in identifying minimal adjustment sets for causal effect estimation.
Method: This paper proposes a local-global collaborative causal discovery framework that avoids full-graph learning. Instead, it determines causal identifiability based on the neighborhood structure of the target variable and precisely identifies mediators and their parent nodes to construct an asymptotically variance-minimal optimal adjustment set.
Contribution/Results: To our knowledge, this is the first local causal discovery method with provable statistical optimality guarantees. It retains linear time complexity while significantly improving estimation accuracy. Experiments on synthetic and real-world datasets demonstrate superior scalability compared to global methods and higher estimation accuracy than state-of-the-art local approaches.
๐ Abstract
Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it then finds the optimal adjustment set by leveraging local causal discovery to infer the mediators and their parents. Otherwise, it returns the locally valid parent adjustment sets based on the learned local structure. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.