Correct Reasoning Paths Visit Shared Decision Pivots

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Intermediate reasoning steps in large language models’ chain-of-thought (CoT) inference are often unverifiable, undermining reliability. Method: This paper introduces the “decision pivot”—a minimal, verifiable node that all correct reasoning paths must traverse. Grounded in the novel hypothesis that distinct valid paths converge to the same pivot, we propose a self-consistent calibration framework requiring no ground-truth annotations. Our approach comprises diverse path sampling, validator-guided pivot discovery, short-path reconstruction, and post-training on self-generated data. Contribution/Results: Evaluated on LogiQA, MedQA, and MATH500, our method significantly improves reasoning accuracy, empirically validating the pivot hypothesis and its cross-domain generalizability. To our knowledge, this is the first work to formalize reasoning-path convergence as a learnable, verifiable structural constraint—establishing a new paradigm for enhancing the trustworthiness of LLM inference.

Technology Category

Application Category

📝 Abstract

Chain-of-thought (CoT) reasoning exposes the intermediate thinking process of large language models (LLMs), yet verifying those traces at scale remains unsolved. In response, we introduce the idea of decision pivots-minimal, verifiable checkpoints that any correct reasoning path must visit. We hypothesize that correct reasoning, though stylistically diverse, converge on the same pivot set, while incorrect ones violate at least one pivot. Leveraging this property, we propose a self-training pipeline that (i) samples diverse reasoning paths and mines shared decision pivots, (ii) compresses each trace into pivot-focused short-path reasoning using an auxiliary verifier, and (iii) post-trains the model using its self-generated outputs. The proposed method aligns reasoning without ground truth reasoning data or external metrics. Experiments on standard benchmarks such as LogiQA, MedQA, and MATH500 show the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Verifying chain-of-thought reasoning traces at scale remains unsolved

Correct reasoning paths must visit shared minimal verifiable checkpoints

Method aligns reasoning without ground truth data or external metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining shared decision pivots from diverse reasoning paths

Compressing reasoning traces into pivot-focused short paths

Self-training models using self-generated reasoning outputs

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting