Search-Based Correction of Reasoning Chains for Language Models

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Language models often suffer from hidden errors in chain-of-thought (CoT) reasoning, undermining the reliability of their inference. To address this, we propose a self-correction framework that jointly models the truthfulness of each reasoning step—treated as a latent variable—and the final answer, enabling end-to-end localization and correction of erroneous steps. Our approach introduces an efficient discrete Boolean search algorithm for truth assignment and a generalizable zero-shot truthfulness discriminator, which evaluates step-level credibility without requiring additional annotations. The method integrates posterior approximate inference, joint likelihood modeling over language models, supervised fine-tuning, and pseudo-label generation. Extensive experiments on ProntoQA and GSM8K demonstrate its effectiveness: it reliably identifies flawed reasoning steps and achieves up to a 25% absolute improvement in final answer accuracy under zero-shot settings.

Technology Category

Application Category

📝 Abstract
Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we introduce a new self-correction framework that augments each reasoning step in a CoT with a latent variable indicating its veracity, enabling modeling of all possible truth assignments rather than assuming correctness throughout. To efficiently explore this expanded space, we introduce Search Corrector, a discrete search algorithm over boolean-valued veracity assignments. It efficiently performs otherwise intractable inference in the posterior distribution over veracity assignments by leveraging the LM's joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time correction method facilitates supervised fine-tuning of an Amortized Corrector by providing pseudo-labels for veracity. The Amortized Corrector generalizes self-correction, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that Search Corrector reliably identifies errors in logical (ProntoQA) and mathematical reasoning (GSM8K) benchmarks. The Amortized Corrector achieves comparable zero-shot accuracy and improves final answer accuracy by up to 25%.
Problem

Research questions and friction points this paper is trying to address.

Correcting inaccuracies in Chain-of-Thought reasoning steps
Efficiently exploring truth assignments via discrete search
Improving zero-shot veracity inference in novel contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-correction framework with latent veracity variables
Discrete search algorithm for veracity assignments
Amortized Corrector for zero-shot veracity inference
M
Minsu Kim
Mila – Quebec AI Institute, KAIST
J
Jean-Pierre Falet
Mila – Quebec AI Institute, Université de Montréal
O
Oliver E. Richardson
Mila – Quebec AI Institute, Université de Montréal
X
Xiaoyin Chen
Mila – Quebec AI Institute, Université de Montréal
Moksh Jain
Moksh Jain
Mila, Université de Montréal
probabilistic machine learning
Sungjin Ahn
Sungjin Ahn
Associate Professor, KAIST
Machine LearningDeep LearningReinforcement LearningAICognitive Science
Sungsoo Ahn
Sungsoo Ahn
KAIST
Machine Learning
Yoshua Bengio
Yoshua Bengio
Professor of computer science, University of Montreal, Mila, IVADO, CIFAR
Machine learningdeep learningartificial intelligence