Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of hallucination and logical inconsistency in multi-hop fact verification, which often arises from large language models’ failure to explicitly model causal dependencies between evidence and claims. The study introduces structural causal models (SCMs) into this task for the first time, reframing multi-hop reasoning as a causal inference problem. To dynamically balance the depth and conciseness of reasoning chains, the authors propose a Group Relative Policy Optimization (GRPO) mechanism that integrates rule-based reinforcement learning. Empirical results reveal an inverted U-shaped relationship between reasoning chain length and accuracy, with the proposed approach significantly outperforming state-of-the-art methods on both the HoVer and EX-FEVER benchmarks while simultaneously enhancing model accuracy and interpretability.

📝 Abstract

Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought (CoT), lack explicit modeling of the causal dependencies between evidence and claims. In this work, we introduce a novel framework that grounds reasoning in a Structural Causal Model (SCM), treating verification as a constructive causal inference process. We empirically identify an "inverted U-shaped" correlation between reasoning chain length and accuracy, revealing that excessive structural complexity degrades performance. To address this, we propose a Rule-based Reinforcement Learning strategy using Group Relative Policy Optimization (GRPO). This approach dynamically optimizes the trade-off between structural depth and conciseness. Extensive experiments on HoVer and EX-FEVER demonstrate that our SCM-GRPO framework significantly outperforms state-of-the-art baselines, offering a reliable and interpretable solution for complex fact verification.

Problem

Research questions and friction points this paper is trying to address.

Multi-Hop Fact Verification

Large Language Models

Causal Dependencies

Hallucinations

Logical Chains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structural Causal Model

Multi-Hop Fact Verification

Group Relative Policy Optimization