Towards a Universal Causal Reasoner

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the limited general-purpose causal reasoning capabilities of large language models, which are often hindered by existing datasets that focus narrowly on specific tasks and fail to support broad generalization. To overcome this, the authors propose UniCo, a framework that systematically encompasses all 18 query types across Pearl’s causal hierarchy for the first time. UniCo generates symbolic examples through precise causal inference and translates them into natural language and code to simulate implicit causal relationships in real-world scenarios, while rigorously filtering out reasoning shortcuts. After supervised fine-tuning, Qwen3 and OLMo series models achieve an average 22.9% improvement in in-distribution query accuracy, surpass state-of-the-art methods by 8.1% across seven external causal benchmarks, and demonstrate a 20.2% average gain in causal faithfulness on tasks spanning healthcare, legal reasoning, and tabular inference.

📝 Abstract

Despite the importance of causal reasoning, training LLMs to reason causally remains underexplored. Existing data efforts mostly focus on benchmarking LLMs on specific aspects of causality, making them less suitable for training generalizable causal reasoners. To address this, we propose UniCo, a data generation framework that both (1) addresses 18 causal query types across Pearl's Causal Ladder and (2) translates natively symbolic examples into code and natural language forms to simulate real-world use cases where causal terms are not explicitly specified. To ensure data quality, UniCo grounds answers with exact causal inference and filters cases with reasoning shortcuts. Upon supervised finetuning with 66.6K UniCo-generated instances, Qwen3-4B, Qwen3-8B and Olmo-3-7B-Instruct achieve an average of 22.9% improvements across all 18 in-distribution query types, and 8.1% over state-of-the-art causal data generation frameworks on 7 established causal benchmarks outside the training distribution. More importantly, in real-world medical understanding, legal decision, and tabular reasoning, UniCo-trained models consistently display more faithful reasoning traces, outperforming the base models by an average of 20.2% in faithfulness metrics. These suggest that causality-centered training not only strengthens causal reasoning, but also equips LLMs with a causal mindset in general reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

causal reasoning

large language models

generalizable reasoning

causal inference

data generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal reasoning

data generation framework

Pearl's Causal Ladder