On the Self-awareness of Large Reasoning Models' Capability Boundaries

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Large reasoning models (LRMs) frequently suffer from unproductive chain-of-thought reasoning—persisting until context exhaustion—yielding incorrect answers and wasted computation, primarily due to their lack of self-awareness regarding capability boundaries. This work identifies, for the first time, that LRMs’ dynamic reasoning confidence and the linear separability of their final-layer hidden states strongly correlate with their intrinsic reasoning limits. Building on this insight, we propose a novel “boundary-aware” paradigm that jointly leverages black-box confidence trajectory analysis and white-box hidden-state discriminability to enable real-time capability assessment and early termination of reasoning. Experiments across diverse reasoning benchmarks demonstrate that our method reduces token consumption by 62.7%–93.6% while preserving original accuracy, significantly enhancing both reliability and efficiency of LRM inference. This establishes a principled foundation for trustworthy, resource-aware reasoning.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) have shown impressive performance on complex reasoning tasks such as mathematics, yet they also display misbehaviors that expose their limitations. In particular, when faced with hard questions, LRMs often engage in unproductive reasoning until context limit, producing wrong answers while wasting substantial computation. This phenomenon reflects a fundamental issue: current answering paradigms overlook the relationship between questions and LRMs' capability boundaries. In this paper, we investigate whether LRMs possess self-awareness of capability boundaries. We begin by an observation that LRMs may know what they cannot solve through expressed reasoning confidence. For black-box models, we find that reasoning expressions reveal boundary signals, with accelerated growing confidence trajectory for solvable problems but convergent uncertainty trajectory for unsolvable ones. For white-box models, we show that hidden states of the last input token encode boundary information, with solvable and unsolvable problems linearly separable even before reasoning begins. Building on these findings, we propose two simple yet effective optimization strategies: reasoning expression monitoring and hidden states monitoring. Experiments demonstrate that these boundary-aware strategies enable LRMs to avoid unproductive reasoning without sacrificing accuracy, significantly improving reliability and efficiency by cutting token usage up to 62.7 - 93.6%.

Problem

Research questions and friction points this paper is trying to address.

Investigating self-awareness of Large Reasoning Models' capability boundaries

Addressing unproductive reasoning on unsolvable problems with boundary signals

Proposing monitoring strategies to improve reliability and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monitoring reasoning expressions to detect capability boundaries

Analyzing hidden states for linear separability of problems

Using boundary-aware strategies to reduce token usage

🔎 Similar Papers

No similar papers found.