Correct Reasoning Paths Visit Shared Decision Pivots

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Intermediate reasoning steps in large language models’ chain-of-thought (CoT) inference are often unverifiable, undermining reliability. Method: This paper introduces the “decision pivot”—a minimal, verifiable node that all correct reasoning paths must traverse. Grounded in the novel hypothesis that distinct valid paths converge to the same pivot, we propose a self-consistent calibration framework requiring no ground-truth annotations. Our approach comprises diverse path sampling, validator-guided pivot discovery, short-path reconstruction, and post-training on self-generated data. Contribution/Results: Evaluated on LogiQA, MedQA, and MATH500, our method significantly improves reasoning accuracy, empirically validating the pivot hypothesis and its cross-domain generalizability. To our knowledge, this is the first work to formalize reasoning-path convergence as a learnable, verifiable structural constraint—establishing a new paradigm for enhancing the trustworthiness of LLM inference.

Technology Category

Application Category

📝 Abstract
Chain-of-thought (CoT) reasoning exposes the intermediate thinking process of large language models (LLMs), yet verifying those traces at scale remains unsolved. In response, we introduce the idea of decision pivots-minimal, verifiable checkpoints that any correct reasoning path must visit. We hypothesize that correct reasoning, though stylistically diverse, converge on the same pivot set, while incorrect ones violate at least one pivot. Leveraging this property, we propose a self-training pipeline that (i) samples diverse reasoning paths and mines shared decision pivots, (ii) compresses each trace into pivot-focused short-path reasoning using an auxiliary verifier, and (iii) post-trains the model using its self-generated outputs. The proposed method aligns reasoning without ground truth reasoning data or external metrics. Experiments on standard benchmarks such as LogiQA, MedQA, and MATH500 show the effectiveness of our method.
Problem

Research questions and friction points this paper is trying to address.

Verifying chain-of-thought reasoning traces at scale remains unsolved
Correct reasoning paths must visit shared minimal verifiable checkpoints
Method aligns reasoning without ground truth data or external metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining shared decision pivots from diverse reasoning paths
Compressing reasoning traces into pivot-focused short paths
Self-training models using self-generated reasoning outputs
🔎 Similar Papers
No similar papers found.
Dongkyu Cho
Dongkyu Cho
Ph.D. Student, New York University
Foundation ModelsPost-trainingContinual Learning
A
Amy B. Z. Zhang
Amazon
B
Bilel Fehri
Amazon
S
Sheng Wang
University of Washington
Rumi Chunara
Rumi Chunara
New York University
ML/AI in Public HealthData ScienceHealth InequitiesSocial Computing
R
Rui Song
Amazon
H
Hengrui Cai
UC Irvine