Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This paper addresses the fundamental challenge of error localization and tracking in long-chain mathematical reasoning by large language models (LLMs). We propose Premise-Augmented Reasoning Chains (PARC), which reformulates linear reasoning as a directed acyclic graph driven by explicit premise dependencies—the first method to introduce an explicit premise-linking mechanism. We construct PERL, the first benchmark dataset specifically designed for premise identification and error detection. Our approach integrates chain-of-thought prompting, graph-structured modeling, premise extraction, and step-wise verification, and is empirically evaluated on both open- and closed-source LLMs. Results show that LLMs achieve 90% recall in premise identification; under premise constraints, step-wise verification improves error detection accuracy by 6–16 percentage points. The core contributions are: (1) establishing a premise-driven, interpretable reasoning paradigm; and (2) providing PERL—the first systematic, joint premise-error evaluation benchmark.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) prompting enhances mathematical reasoning in large language models (LLMs) by enabling detailed step-by-step solutions. However, due to the verbosity of LLMs, the resulting reasoning chains can be long, making it harder to verify the reasoning steps and trace issues resulting from dependencies between the steps that may be farther away in the sequence of steps. Importantly, mathematical reasoning allows each step to be derived from a small set of premises, which are a subset of the preceding steps in the reasoning chain. In this paper, we present a framework that identifies the premises for each step, to improve the evaluation of reasoning. We restructure conventional linear reasoning chains into Premise Augmented Reasoning Chains (PARC) by introducing premise links, resulting in a directed acyclic graph where the nodes are the steps and the edges are the premise links. Through experiments with a PARC-based dataset that we built, namely PERL (Premises and ERrors identification in LLMs), we demonstrate that LLMs can reliably identify premises within complex reasoning chains. In particular, even open-source LLMs achieve 90% recall in premise identification. We also show that PARC helps to identify errors in reasoning chains more reliably. The accuracy of error identification improves by 6% to 16% absolute when step-by-step verification is carried out in PARC under the premises. Our findings highlight the utility of premise-centric representations in addressing complex problem-solving tasks and open new avenues for improving the reliability of LLM-based reasoning evaluations.

Problem

Research questions and friction points this paper is trying to address.

Enhances error identification in LLMs

Restructures reasoning chains with premises

Improves reliability of mathematical reasoning evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Premise Augmented Reasoning Chains

Directed Acyclic Graph structure

Improved error identification accuracy

🔎 Similar Papers

LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought