IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional causal discovery methods rely on structured data, suffering from high collection costs, strong assumptions, and limited capacity to uncover novel causal relationships; while existing LLM-based approaches can identify known causal relations, they lack genuine causal discovery capability. Method: We propose the first end-to-end, dataset-free causal discovery framework that iteratively retrieves domain documents and extracts variables, synergistically integrating statistical causal discovery algorithms with large language models. It jointly discovers both known and novel causal relations and introduces a novel missing-variable recommendation mechanism to dynamically expand the causal graph. Contribution/Results: The framework enables real-time, verifiable causal discovery and latent variable completion directly from unstructured text. Experiments demonstrate significant improvements in coverage, novelty, and interpretability of discovered causal relations compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
Causal discovery is fundamental to scientific research, yet traditional statistical algorithms face significant challenges, including expensive data collection, redundant computation for known relations, and unrealistic assumptions. While recent LLM-based methods excel at identifying commonly known causal relations, they fail to uncover novel relations. We introduce IRIS (Iterative Retrieval and Integrated System for Real-Time Causal Discovery), a novel framework that addresses these limitations. Starting with a set of initial variables, IRIS automatically collects relevant documents, extracts variables, and uncovers causal relations. Our hybrid causal discovery method combines statistical algorithms and LLM-based methods to discover known and novel causal relations. In addition to causal discovery on initial variables, the missing variable proposal component of IRIS identifies and incorporates missing variables to expand the causal graphs. Our approach enables real-time causal discovery from only a set of initial variables without requiring pre-existing datasets.
Problem

Research questions and friction points this paper is trying to address.

Discovering causal relations without pre-existing tabular data
Combining statistical and LLM methods to find novel relations
Automatically expanding causal graphs by identifying missing variables
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative framework combining statistical and LLM methods
Automatically collects documents and extracts causal variables
Identifies missing variables to expand causal graphs
🔎 Similar Papers
No similar papers found.
T
Tao Feng
Monash University
L
Lizhen Qu
Monash University
Niket Tandon
Niket Tandon
Principal Researcher, Microsoft Research Bangalore
Commonsense ReasoningAI
G
Gholamreza Haffari
Monash University