Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Prior work underestimates large language models’ (LLMs’) intrinsic error-correction mechanisms, over-relying on multi-step verification or fine-tuning to mitigate chain-of-thought (CoT) failures. Method: This study systematically perturbs CoT reasoning—introducing logical, arithmetic, and structural errors—and evaluates zero-shot single-turn self-correction capabilities of open-weight LLMs (e.g., Llama, Qwen, Phi) across GSM8K and MMLU. Contribution/Results: Unfine-tuned models achieve 72–89% single-turn self-correction success under diverse perturbations—surpassing conventional two-stage verification baselines. This is the first empirical demonstration that open LLMs possess robust, inherent single-turn implicit and explicit error correction, challenging the prevailing assumption that reliable CoT requires task-specific fine-tuning. The findings suggest that advanced reasoning is an emergent property amplified by model architecture and scale, rather than a behavior contingent upon external supervision or iterative refinement.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities, yet their performance remains brittle to minor variations in problem description and prompting strategy. Furthermore, reasoning is vulnerable to sampling-induced errors which autoregressive models must primarily address using self-correction via additionally-generated tokens. To better understand self-correction capabilities of recent models, we conduct experiments measuring models'ability to self-correct synthetic perturbations introduced into their Chain of Thought (CoT) reasoning. We observe robust single-utterance intrinsic self-correction behavior across a range of open-weight models and datasets, ranging from subtle, implicit corrections to explicit acknowledgments and corrections of errors. Our findings suggest that LLMs, including those not finetuned for long CoT, may possess stronger intrinsic self-correction capabilities than commonly shown in the literature. The presence of this ability suggests that recent"reasoning"model work involves amplification of traits already meaningfully present in models.

Problem

Research questions and friction points this paper is trying to address.

LLMs' brittleness to minor reasoning variations

Self-correction of sampling-induced errors in reasoning

Measuring intrinsic self-correction in perturbed CoT

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-utterance self-correction of perturbed reasoning

Experiments with synthetic perturbations in Chain of Thought

Intrinsic self-correction across open-weight models and datasets

🔎 Similar Papers

No similar papers found.