The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

207K/year

🤖 AI Summary

Existing causal discovery methods rely on unrealistic assumptions—such as causal sufficiency and absence of confounding—and are predominantly validated on synthetic data, lacking rigorous evaluation in real-world scientific contexts. Method: We conduct a systematic, interdisciplinary study comprising a comprehensive literature review and in-depth case analyses across three domains—biology, neuroscience, and earth science—to construct the first data atlas encompassing authentic causal data sources. We identify prevalent assumption violations—including latent confounders, temporal aggregation bias, and measurement noise—and propose a domain-driven evaluation paradigm to replace simulation-centric benchmarks. Contribution/Results: We introduce a practical, application-oriented evaluation framework and release an open assessment guideline for real-world causal discovery. This work bridges the gap between methodological innovation and scientific utility, advancing causal discovery from theoretical exploration toward a trustworthy tool for empirical scientific discovery.

Technology Category

Application Category

📝 Abstract

Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.

Problem

Research questions and friction points this paper is trying to address.

Causal discovery lacks real-world applications due to unrealistic assumptions

Current methods rely on simple synthetic datasets with poor metrics

Need better evaluation practices using realistic datasets and metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of causal discovery literature

Utilizing realistic datasets from multiple domains

Proposing better evaluation practices and metrics

🔎 Similar Papers

No similar papers found.