🤖 AI Summary
This work addresses the core challenge in causal discovery: reliably inferring causal relationships from observational data while effectively integrating expert knowledge with statistical evidence. It introduces large language models (LLMs) as imperfect experts and proposes a semantics-driven mechanism to extract structural priors. Specifically, the method leverages an LLM to interpret variable semantics and generate causal constraints, which are then fused with conditional independence test results within a constraint-driven Causal Argumentation-Based Approach (Causal ABA) framework to construct causally plausible graphs. Evaluated on standard benchmarks and semantically synthesized graphs, the proposed approach achieves state-of-the-art performance, demonstrating that LLMs can significantly enhance both the accuracy and generalization capability of causal discovery.
📝 Abstract
Causal discovery seeks to uncover causal relations from data, typically represented as causal graphs, and is essential for predicting the effects of interventions. While expert knowledge is required to construct principled causal graphs, many statistical methods have been proposed to leverage observational data with varying formal guarantees. Causal Assumption-based Argumentation (ABA) is a framework that uses symbolic reasoning to ensure correspondence between input constraints and output graphs, while offering a principled way to combine data and expertise. We explore the use of large language models (LLMs) as imperfect experts for Causal ABA, eliciting semantic structural priors from variable names and descriptions and integrating them with conditional-independence evidence. Experiments on standard benchmarks and semantically grounded synthetic graphs demonstrate state-of-the-art performance, and we additionally introduce an evaluation protocol to mitigate memorisation bias when assessing LLMs for causal discovery.