đ€ AI Summary
Existing causal discovery methods rely on unrealistic assumptionsâsuch as causal sufficiency and absence of confoundingâand are predominantly validated on synthetic data, lacking rigorous evaluation in real-world scientific contexts. Method: We conduct a systematic, interdisciplinary study comprising a comprehensive literature review and in-depth case analyses across three domainsâbiology, neuroscience, and earth scienceâto construct the first data atlas encompassing authentic causal data sources. We identify prevalent assumption violationsâincluding latent confounders, temporal aggregation bias, and measurement noiseâand propose a domain-driven evaluation paradigm to replace simulation-centric benchmarks. Contribution/Results: We introduce a practical, application-oriented evaluation framework and release an open assessment guideline for real-world causal discovery. This work bridges the gap between methodological innovation and scientific utility, advancing causal discovery from theoretical exploration toward a trustworthy tool for empirical scientific discovery.
đ Abstract
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.