Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
This work addresses the challenge that large language models often commit intra-step errors (e.g., logical flaws, hallucinations) and inter-step errors (e.g., over- or under-thinking) during reasoning, which are difficult to correct using ground-truth labels alone. To this end, the paper proposes CRAFT, a novel framework that first exposes the limitations of ground-truth labels in enhancing reasoning capabilities and introduces a Reasoning Knowledge Graph (RKG). The RKG aggregates consensus segments from multiple candidate reasoning trajectories and leverages graph topology to generate high-quality reasoning chains, enabling unified modeling and correction of both error types. Experiments demonstrate that CRAFT achieves an average accuracy improvement of over 10% on logical and mathematical reasoning benchmarks, significantly outperforming existing methods while enhancing reasoning trajectory quality across multiple dimensions.

Technology Category

Application Category

📝 Abstract
LLM reasoning traces suffer from complex flaws -- *Step Internal Flaws* (logical errors, hallucinations, etc.) and *Step-wise Flaws* (overthinking, underthinking), which vary by sample. A natural approach would be to provide ground-truth labels to guide LLMs' reasoning. Contrary to intuition, we show that this yields no improvement in reasoning ability. We then propose CRAFT, a unified framework that mitigates both types of Step flaws, which builds a Reasoning Knowledge Graph (RKG) based on the consensus parts of multiple candidate traces, and synthesizes a high-quality trace through topological generation. Our approach improves label-prediction accuracy by 10+% on average, and consistently outperforms all baselines across both logical and mathematical reasoning benchmarks. Further, detailed benchmark evaluation proves that our method also improves the quality of LLMs' reasoning traces in multiple dimensions.
Problem

Research questions and friction points this paper is trying to address.

LLM reasoning
reasoning flaws
chain-of-thought
hallucination
reasoning trace
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
Reasoning Knowledge Graph
Consensus Reasoning
LLM Reasoning Robustness
Topological Generation