🤖 AI Summary
Large language models (LLMs) exhibit poor robustness in causal discovery tasks, degrading to near-random performance under data perturbations. To address this, we propose Modular Context Learning (MCL), a structured prompting framework inspired by chain-of-thought and tree-of-thought reasoning. MCL decomposes causal inference into four interpretable, sequential submodules: observable variable identification, correlation analysis, confounder discrimination, and causal graph generation. Evaluated on the Corr2Cause benchmark, MCL—when instantiated with OpenAI’s o-series and DeepSeek-R models—achieves nearly threefold accuracy improvement over conventional methods. We further demonstrate that reasoning chain length and structural complexity critically influence robustness. This work introduces the first modular architecture for LLM-based causal discovery, substantially enhancing both accuracy and perturbation resilience. It establishes a general, interpretable, and scalable framework for cross-domain causal inference.
📝 Abstract
Causal inference remains a fundamental challenge for large language models. Recent advances in internal reasoning with large language models have sparked interest in whether state-of-the-art reasoning models can robustly perform causal discovery-a task where conventional models often suffer from severe overfitting and near-random performance under data perturbations. We study causal discovery on the Corr2Cause benchmark using the emergent OpenAI's o-series and DeepSeek-R model families and find that these reasoning-first architectures achieve significantly greater native gains than prior approaches. To capitalize on these strengths, we introduce a modular in-context pipeline inspired by the Tree-of-Thoughts and Chain-of-Thoughts methodologies, yielding nearly three-fold improvements over conventional baselines. We further probe the pipeline's impact by analyzing reasoning chain length, complexity, and conducting qualitative and quantitative comparisons between conventional and reasoning models. Our findings suggest that while advanced reasoning models represent a substantial leap forward, carefully structured in-context frameworks are essential to maximize their capabilities and offer a generalizable blueprint for causal discovery across diverse domains.