🤖 AI Summary
This work addresses the challenge of correcting factual errors in multi-hop reasoning, where existing methods struggle to pinpoint semantic inaccuracies within complex inference chains and are hindered by the scarcity of paired training data. To overcome these limitations, the authors propose CECoR, a novel framework introducing a “decompose-and-inject” paradigm: it first decomposes multi-hop claims into interpretable reasoning steps and then generates high-quality synthetic data via controlled perturbations to alleviate annotation scarcity. The model is subsequently optimized through a two-stage process combining supervised fine-tuning and reinforcement learning. Extensive experiments demonstrate that CECoR substantially outperforms distant supervision and few-shot large language model baselines on multi-hop benchmarks, while also exhibiting strong generalization and robustness on single-hop tasks and in the presence of noisy evidence.
📝 Abstract
Factual Error Correction (FEC) aims to revise inaccurate text into statements that are factually consistent with external evidence. Although recent methods perform well on single-hop correction, they often treat claims as atomic units and struggle with multi-hop cases that require compositional reasoning across multiple evidence sources. This challenge is further amplified by limited paired data and difficulties in locating semantic errors within complex reasoning chains. We present CECoR (Compositional Error Correction via Reasoning-aware Synthesis), a reasoning-aware framework that introduces a Decomposition and Injection paradigm for compositional error correction. CECoR decomposes multi-hop claims into interpretable reasoning steps and injects controlled perturbations to synthesize high-quality training pairs. A two-stage learning strategy combining supervised fine-tuning and reinforcement learning improves factual accuracy and robustness. Comprehensive evaluations show that CECoR achieves strong performance on multi-hop benchmarks, outperforming both distantly supervised methods and few-shot LLM baselines. It also generalizes effectively to single-hop correction and remains stable under noisy evidence, demonstrating its versatility for real-world factual correction.