CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing test-time reasoning methods struggle to exert explicit, fine-grained control over operations such as expansion, pruning, repair, and abstention. This work proposes a training-free metacognitive reasoning framework that dynamically modulates reasoning trajectories under a fixed computational budget. By integrating policy-conditioned object-level chain-of-thought generation, tree-structured search, and an online process evaluator—augmented with an explicit meta-controller introduced for the first time—the framework enables adaptive decision-making over the reasoning process itself. Evaluated on benchmarks including MATH, GPQA, and GSM8K, the method significantly outperforms strong baselines, achieving gains of up to 5.2 percentage points while demonstrating superior computational efficiency and generalization capability.
📝 Abstract
Recent test-time reasoning methods improve performance by generating more candidate chains or searching over larger reasoning trees, but they typically lack explicit control over when to expand, what to prune, how to repair, and when to abstain. We introduce CoT2-Meta, a training-free metacognitive reasoning framework that combines object-level chain-of-thought generation with meta-level control over partial reasoning trajectories. The framework integrates four components: strategy-conditioned thought generation, tree-structured search, an online process oracle for step-level reasoning evaluation, and a meta-controller that allocates computation through expansion, pruning, repair, stopping, and fallback decisions. Under matched inference budgets, CoT2-Meta consistently outperforms strong single-path, sampling-based, and search-based baselines, including ReST-MCTS. On the default backbone, it achieves 92.8 EM on MATH, 90.4 accuracy on GPQA, 98.65 EM on GSM8K, 75.8 accuracy on BBEH, 85.6 accuracy on MMMU-Pro, and 48.8 accuracy on HLE, with gains over the strongest non-CoT2-Meta baseline of +3.6, +5.2, +1.15, +2.0, +4.3, and +4.3 points, respectively. Beyond these core results, the framework remains effective across a broader 15-benchmark suite spanning knowledge and QA, multi-hop reasoning, coding, and out-of-distribution evaluation. Additional analyses show better compute scaling, improved calibration, stronger selective prediction, targeted repair behavior, and consistent gains across backbone families. These results suggest that explicit metacognitive control is a practical design principle for reliable and compute-efficient test-time reasoning systems.
Problem

Research questions and friction points this paper is trying to address.

test-time reasoning
metacognitive control
reasoning trajectory
compute budget
chain-of-thought
Innovation

Methods, ideas, or system contributions that make the work stand out.

metacognitive control
chain-of-thought reasoning
test-time computation allocation
tree-structured search
training-free framework
S
Siyuan Ma
Nanyang Technological University
B
Bo Gao
Carnegie Mellon University
Z
Zikai Xiao
Zhejiang University
Hailong Wang
Hailong Wang
Pacific Northwest National Laboratory
aerosolscloudatmospheric physicshigh-latitude changesclimate modeling
Xinlei Yu
Xinlei Yu
Beijing University of Posts and Telecommunications
Stochastic Geometry
R
Rui Qian
Fudan University
J
Jiayu Qian
City University of Hong Kong (Dongguan)
L
Luqi Gong
Zhejiang Lab
Yang Liu
Yang Liu
Nanyang Technological University
AgentSoftware EngineeringCyber SecurityTrustworthy AISoftware Security