🤖 AI Summary
Intermediate reasoning steps in large language models’ chain-of-thought (CoT) inference are often unverifiable, undermining reliability. Method: This paper introduces the “decision pivot”—a minimal, verifiable node that all correct reasoning paths must traverse. Grounded in the novel hypothesis that distinct valid paths converge to the same pivot, we propose a self-consistent calibration framework requiring no ground-truth annotations. Our approach comprises diverse path sampling, validator-guided pivot discovery, short-path reconstruction, and post-training on self-generated data. Contribution/Results: Evaluated on LogiQA, MedQA, and MATH500, our method significantly improves reasoning accuracy, empirically validating the pivot hypothesis and its cross-domain generalizability. To our knowledge, this is the first work to formalize reasoning-path convergence as a learnable, verifiable structural constraint—establishing a new paradigm for enhancing the trustworthiness of LLM inference.
📝 Abstract
Chain-of-thought (CoT) reasoning exposes the intermediate thinking process of large language models (LLMs), yet verifying those traces at scale remains unsolved. In response, we introduce the idea of decision pivots-minimal, verifiable checkpoints that any correct reasoning path must visit. We hypothesize that correct reasoning, though stylistically diverse, converge on the same pivot set, while incorrect ones violate at least one pivot. Leveraging this property, we propose a self-training pipeline that (i) samples diverse reasoning paths and mines shared decision pivots, (ii) compresses each trace into pivot-focused short-path reasoning using an auxiliary verifier, and (iii) post-trains the model using its self-generated outputs. The proposed method aligns reasoning without ground truth reasoning data or external metrics. Experiments on standard benchmarks such as LogiQA, MedQA, and MATH500 show the effectiveness of our method.