Generating Effective CoT Traces for Mitigating Causal Hallucination

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

177K/year
🤖 AI Summary
This work addresses the challenges of causal hallucination and the scarcity of high-quality chain-of-thought (CoT) training data in small-scale large language models for event causality identification. It introduces, for the first time, the Causal Hallucination Rate (CHR) as a quantitative metric and formally defines key criteria for effective CoT reasoning trajectories. Building upon these foundations, the authors develop a low-hallucination, high-generalization CoT data generation pipeline tailored specifically for small models. Experimental results demonstrate that the proposed approach substantially reduces causal hallucination and improves average accuracy, while exhibiting strong generalization and robustness across diverse datasets, varying difficulty levels, and under misleading intervention scenarios.

Technology Category

Application Category

📝 Abstract
Although large language models (LLMs) excel in complex reasoning tasks, they suffer from severe causal hallucination in event causality identification (ECI), particularly in smaller models ($\leq$1.5B parameters). A promising approach to address this issue is to fine-tune them with Chain-of-Thought (CoT) traces. However, there is currently a lack of CoT trace dataset available for ECI. In this paper, we first investigate the essential criteria that effective CoT traces should possess to mitigate causal hallucination in smaller models. We then design a pipeline to generate CoT traces that meet these criteria. Moreover, since there is currently no metric for quantifying causal hallucination, we also introduce a new metric, the Causal Hallucination Rate (CHR), to quantify causal hallucination, guide the formulation of effective CoT trace criteria, and validate the effectiveness of our pipeline. Our experiments show that fine-tuning with the CoT traces generated by our pipeline not only substantially reduces causal hallucination in smaller LLMs but also improves mean accuracy. Moreover, the fine-tuned models exhibit strong cross-dataset and cross-difficulty generalization, as well as robustness under misleading intervention prompts.
Problem

Research questions and friction points this paper is trying to address.

causal hallucination
event causality identification
Chain-of-Thought
large language models
small models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
causal hallucination
event causality identification
Causal Hallucination Rate
fine-tuning