IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Traditional causal discovery methods rely on structured data, suffering from high collection costs, strong assumptions, and limited capacity to uncover novel causal relationships; while existing LLM-based approaches can identify known causal relations, they lack genuine causal discovery capability. Method: We propose the first end-to-end, dataset-free causal discovery framework that iteratively retrieves domain documents and extracts variables, synergistically integrating statistical causal discovery algorithms with large language models. It jointly discovers both known and novel causal relations and introduces a novel missing-variable recommendation mechanism to dynamically expand the causal graph. Contribution/Results: The framework enables real-time, verifiable causal discovery and latent variable completion directly from unstructured text. Experiments demonstrate significant improvements in coverage, novelty, and interpretability of discovered causal relations compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Causal discovery is fundamental to scientific research, yet traditional statistical algorithms face significant challenges, including expensive data collection, redundant computation for known relations, and unrealistic assumptions. While recent LLM-based methods excel at identifying commonly known causal relations, they fail to uncover novel relations. We introduce IRIS (Iterative Retrieval and Integrated System for Real-Time Causal Discovery), a novel framework that addresses these limitations. Starting with a set of initial variables, IRIS automatically collects relevant documents, extracts variables, and uncovers causal relations. Our hybrid causal discovery method combines statistical algorithms and LLM-based methods to discover known and novel causal relations. In addition to causal discovery on initial variables, the missing variable proposal component of IRIS identifies and incorporates missing variables to expand the causal graphs. Our approach enables real-time causal discovery from only a set of initial variables without requiring pre-existing datasets.

Problem

Research questions and friction points this paper is trying to address.

Discovering causal relations without pre-existing tabular data

Combining statistical and LLM methods to find novel relations

Automatically expanding causal graphs by identifying missing variables

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative framework combining statistical and LLM methods

Automatically collects documents and extracts causal variables

Identifies missing variables to expand causal graphs

🔎 Similar Papers

No similar papers found.