From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) often suffer from redundant inference—termed “overthinking”—due to insufficient autonomous regulation of the reasoning process, resulting in excessive computational cost and high response latency, which hinders practical deployment. To address this, we propose MERA, a metacognitive reasoning framework that explicitly decouples reasoning from control for the first time. MERA achieves this via takeover-style data construction and structured supervised fine-tuning to isolate reasoning and control modules, and introduces a control-mask-guided segmented policy optimization enabling dynamic termination or adaptation of reasoning paths. Experiments across multiple complex reasoning benchmarks demonstrate that MERA significantly improves both accuracy and inference efficiency: it reduces token consumption by 37% on average and lowers latency by 29%, while maintaining or enhancing solution quality. MERA establishes a scalable paradigm for controllable, efficient reasoning in LRMs.

Technology Category

Application Category

📝 Abstract
Large Reasoning Models (LRMs) have demonstrated a latent capacity for complex reasoning by spontaneously exhibiting cognitive behaviors such as step-by-step reasoning, reflection, and backtracking, commonly referred to as "Aha Moments". However, such emergent behaviors remain unregulated and uncontrolled, often resulting in overthinking, where the model continues generating redundant reasoning content even after reaching reliable conclusions. This leads to excessive computational costs and increased latency, limiting the practical deployment of LRMs. The root cause lies in the absence of intrinsic regulatory mechanisms, as current models are unable to monitor and adaptively manage their reasoning process to determine when to continue, backtrack, or terminate. To address this issue, we propose the Meta-cognitive Reasoning Framework (MERA), which explicitly decouples the thinking process into distinct reasoning and control components, thereby enabling the independent optimization of control strategies. Specifically, MERA incorporates a takeover-based data construction mechanism that identifies critical decision points during reasoning and delegates the creation of control signals to auxiliary LLMs, thereby enabling the construction of high-quality reasoning-control data. Additionally, a structured reasoning-control separation is implemented via supervised fine-tuning, enabling the model to generate explicit traces and acquire initial meta-cognitive control capabilities. Finally, MERA employs Control-Segment Policy Optimization (CSPO), which combines segment-wise Group Relative Policy Optimization (GRPO) with a control-masking mechanism to optimize control behavior learning while minimizing interference from irrelevant content. Experiments on various reasoning benchmarks demonstrate that models trained with MERA enhance both reasoning efficiency and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Uncontrolled reasoning behaviors in Large Reasoning Models (LRMs) lead to inefficiency.
Absence of regulatory mechanisms causes redundant reasoning and high computational costs.
Need for decoupled reasoning and control to optimize model performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples reasoning and control components
Uses takeover-based data construction mechanism
Implements Control-Segment Policy Optimization
🔎 Similar Papers
R
Rui Ha
Beijing University of Posts and Telecommunications
Chaozhuo Li
Chaozhuo Li
Microsoft Research Aisa
R
Rui Pu
Beijing University of Posts and Telecommunications
S
Sen Su
Beijing University of Posts and Telecommunications