CARE: Turning LLMs Into Causal Reasoning Expert

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Large language models (LLMs) exhibit strong reliance on variable semantics rather than observational data in causal discovery, resulting in weak causal reasoning capabilities. Method: We propose CARE, the first framework enabling LLMs to structurally integrate and reason over outputs of classical causal discovery algorithms—such as sufficient statistics—via supervised fine-tuning. Algorithmically derived statistical features are incorporated as prompt inputs, guiding the model to perform data-driven, rather than name-based, causal judgments. Contribution/Results: CARE overcomes inherent semantic biases in LLMs. On standard causal discovery benchmarks, the fine-tuned Qwen2.5-1.5B model significantly outperforms mainstream causal discovery algorithms and surpasses GPT-4—despite having only 0.1% of its parameter count—demonstrating the effectiveness and scalability of the “algorithm + LLM” synergistic paradigm.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently demonstrated impressive capabilities across a range of reasoning and generation tasks. However, research studies have shown that LLMs lack the ability to identify causal relationships, a fundamental cornerstone of human intelligence. We first conduct an exploratory investigation of LLMs' behavior when asked to perform a causal-discovery task and find that they mostly rely on the semantic meaning of variable names, ignoring the observation data. This is unsurprising, given that LLMs were never trained to process structural datasets. To first tackle this challenge, we prompt the LLMs with the outputs of established causal discovery algorithms designed for observational datasets. These algorithm outputs effectively serve as the sufficient statistics of the observation data. However, quite surprisingly, we find that prompting the LLMs with these sufficient statistics decreases the LLMs' performance in causal discovery. To address this current limitation, we propose CARE, a framework that enhances LLMs' causal-reasoning ability by teaching them to effectively utilize the outputs of established causal-discovery algorithms through supervised fine-tuning. Experimental results show that a finetuned Qwen2.5-1.5B model produced by CARE significantly outperforms both traditional causal-discovery algorithms and state-of-the-art LLMs with over a thousand times more parameters, demonstrating effective utilization of its own knowledge and the external algorithmic clues.

Problem

Research questions and friction points this paper is trying to address.

LLMs lack causal relationship identification ability in reasoning tasks

LLMs ignore observation data and rely on variable name semantics

Current prompting with causal algorithm outputs decreases LLM performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with causal discovery algorithm outputs

Teaching models to utilize external algorithmic clues

Enhancing causal reasoning through supervised fine-tuning

🔎 Similar Papers

Causal Inference with Large Language Model: A Survey