🤖 AI Summary
Learning causal graphs from high-dimensional, complex data faces challenges of insufficient prior knowledge and poor scalability. To address this, we propose Cluster-DAGs—a structured DAG prior framework based on variable clustering—that offers both flexibility and interpretability, outperforming conventional hierarchical priors. Building upon this, we design two novel algorithms: Cluster-PC for fully observed settings and Cluster-FCI for partially observed settings with latent variables or selection bias. Both integrate constraint-based causal discovery, conditional independence testing, and clustering-driven variable grouping. In extensive simulations, Cluster-PC and Cluster-FCI significantly outperform standard PC and FCI baselines, achieving substantial improvements in accuracy and robustness of causal structure recovery.
📝 Abstract
Finding cause-effect relationships is of key importance in science. Causal discovery aims to recover a graph from data that succinctly describes these cause-effect relationships. However, current methods face several challenges, especially when dealing with high-dimensional data and complex dependencies. Incorporating prior knowledge about the system can aid causal discovery. In this work, we leverage Cluster-DAGs as a prior knowledge framework to warm-start causal discovery. We show that Cluster-DAGs offer greater flexibility than existing approaches based on tiered background knowledge and introduce two modified constraint-based algorithms, Cluster-PC and Cluster-FCI, for causal discovery in the fully and partially observed setting, respectively. Empirical evaluation on simulated data demonstrates that Cluster-PC and Cluster-FCI outperform their respective baselines without prior knowledge.