Can Large Language Models Help Experimental Design for Causal Discovery?

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

In causal discovery, accurately identifying causal structures from observational data alone remains challenging, while intervention-based experiments suffer from high costs and low sample efficiency. To address this, we propose the first interpretable intervention-target recommendation framework that integrates large language model (LLM) priors with numerical optimization. Our method embeds LLMs’ world knowledge into causal graph modeling and active learning, and employs a multi-round intervention feedback mechanism to enable end-to-end LLM-augmented experimental design. Evaluated on four real-world benchmarks, our approach achieves significantly higher causal structure identification accuracy and sample efficiency than conventional uncertainty- or gradient-driven methods—and even surpasses human expert judgment. This work provides the first empirical validation of substantial, generalizable gains from LLMs in scientific experimental design.

Technology Category

Application Category

📝 Abstract

Designing proper experiments and selecting optimal intervention targets is a longstanding problem in scientific or causal discovery. Identifying the underlying causal structure from observational data alone is inherently difficult.Obtaining interventional data, on the other hand, is crucial to causal discovery, yet it is usually expensive and time-consuming to gather sufficient interventional data to facilitate causal discovery.Previous approaches commonly utilize uncertainty or gradient signals to determine the intervention targets. However, numerical-based approaches may yield suboptimal results due to the inaccurate estimation of the guiding signals at the beginning when with limited interventional data. In this work, we investigate a different approach, whether we can leverage Large Language Models (LLMs) to assist with the intervention targeting in causal discovery by making use of the rich world knowledge about the experimental design in LLMs.Specifically, we present oursfull (ours) -- a robust framework that effectively incorporates LLMs to augment existing numerical approaches for the intervention targeting in causal discovery. Across $4$ realistic benchmark scales, ours demonstrates significant improvements and robustness over existing methods and even surpasses humans, which demonstrates the usefulness of LLMs in assisting with experimental design for scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Optimizing intervention targets for causal discovery.

Reducing cost and time of obtaining interventional data.

Enhancing accuracy in causal structure identification.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage LLMs for intervention targeting

Augment numerical approaches with LLMs

Improve robustness in causal discovery

🔎 Similar Papers

Causal Inference with Large Language Model: A Survey