🤖 AI Summary
This work addresses the challenge that large language models struggle to accurately identify and correct their own errors during reasoning. To this end, the authors propose the Thought-ICS framework, which structures the reasoning process into discrete, semantically coherent thought steps and incorporates an error verification and backtracking mechanism that mimics human-like behavior of monitoring mistakes at decision points and resampling alternatives. This approach achieves efficient autonomous self-correction without external intervention for the first time. By integrating structured prompting with iterative thought sampling, Thought-ICS improves self-correction performance by 20–40% when oracle verification is available and substantially outperforms existing self-correction baselines in fully autonomous settings.
📝 Abstract
Self-correction in language models remains elusive. In this work, we explore whether language models can explicitly localize errors in incorrect reasoning, as a path toward building AI systems that can effectively correct themselves. We introduce a prompting method that structures reasoning as discrete, semantically coherent thought steps, and show that models are able to reliably localize errors within this structure, while failing to do so in conventional, unstructured chain-of-thought reasoning. Motivated by how the human brain monitors errors at discrete decision points and resamples alternatives, we introduce Iterative Correction Sampling of Thoughts (Thought-ICS), a self-correction framework. Thought-ICS iteratively prompts the model to generate reasoning one discrete and complete thought at a time--where each thought represents a deliberate decision by the model--creating natural boundaries for precise error localization. Upon verification, the model localizes the first erroneous step, and the system backtracks to generate alternative reasoning from the last correct point. When asked to correct reasoning verified as incorrect by an oracle, Thought-ICS achieves 20-40% self-correction lift. In a completely autonomous setting without external verification, it outperforms contemporary self-correction baselines.