Search-Based Correction of Reasoning Chains for Language Models

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Language models often suffer from hidden errors in chain-of-thought (CoT) reasoning, undermining the reliability of their inference. To address this, we propose a self-correction framework that jointly models the truthfulness of each reasoning step—treated as a latent variable—and the final answer, enabling end-to-end localization and correction of erroneous steps. Our approach introduces an efficient discrete Boolean search algorithm for truth assignment and a generalizable zero-shot truthfulness discriminator, which evaluates step-level credibility without requiring additional annotations. The method integrates posterior approximate inference, joint likelihood modeling over language models, supervised fine-tuning, and pseudo-label generation. Extensive experiments on ProntoQA and GSM8K demonstrate its effectiveness: it reliably identifies flawed reasoning steps and achieves up to a 25% absolute improvement in final answer accuracy under zero-shot settings.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we introduce a new self-correction framework that augments each reasoning step in a CoT with a latent variable indicating its veracity, enabling modeling of all possible truth assignments rather than assuming correctness throughout. To efficiently explore this expanded space, we introduce Search Corrector, a discrete search algorithm over boolean-valued veracity assignments. It efficiently performs otherwise intractable inference in the posterior distribution over veracity assignments by leveraging the LM's joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time correction method facilitates supervised fine-tuning of an Amortized Corrector by providing pseudo-labels for veracity. The Amortized Corrector generalizes self-correction, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that Search Corrector reliably identifies errors in logical (ProntoQA) and mathematical reasoning (GSM8K) benchmarks. The Amortized Corrector achieves comparable zero-shot accuracy and improves final answer accuracy by up to 25%.

Problem

Research questions and friction points this paper is trying to address.

Correcting inaccuracies in Chain-of-Thought reasoning steps

Efficiently exploring truth assignments via discrete search

Improving zero-shot veracity inference in novel contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-correction framework with latent veracity variables

Discrete search algorithm for veracity assignments

Amortized Corrector for zero-shot veracity inference

🔎 Similar Papers

Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning