TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the limited prospective metacognitive control of large language models (LLMs) in resource-constrained settings, which hinders their ability to efficiently plan task selection, sequencing, and computational resource allocation. To tackle this gap, the authors introduce TRIAGE, a novel evaluation framework that incorporates human-inspired prospective metacognition into LLM assessment for the first time. TRIAGE establishes a unified triadic decision paradigm: given a task pool and a token budget, the model must generate a complete execution plan in a single pass, with scheduling efficiency evaluated via an Oracle-aligned scoring mechanism. Experiments across diverse task sets on state-of-the-art and open-source models reveal a pronounced deficiency in this capability, persisting even when explicit reasoning is enabled, thereby underscoring its critical importance for deploying efficient AI agents.
📝 Abstract
Deploying language models as autonomous agents requires more than per-task accuracy: when an agent faces a queue of problems under a finite token budget, it must decide which to attempt, in what order, and how much compute to commit to each, all before any execution feedback is available. This is the prospective form of metacognitive control studied for decades in human cognition, yet whether language models possess it remains untested. We introduce TRIAGE, an evaluation framework in which a model receives a task pool and a token budget calibrated to its own baseline cost, and commits to a single ordered plan that jointly encodes selection, sequencing, and per-problem allocation. Plans are scored against an oracle with full knowledge of the model's solvability and cost on each problem, yielding a triage efficiency ratio on a common scale. We evaluate frontier and open-source models, with and without reasoning enabled, across competition mathematics, graduate-level science, code generation, and expert multidisciplinary knowledge, and find that current language models exhibit substantial gaps in prospective metacognitive control, revealing a previously unmeasured capability dimension with direct implications for resource-efficient agent deployment.
Problem

Research questions and friction points this paper is trying to address.

prospective metacognitive control
resource constraints
language models
autonomous agents
task triage
Innovation

Methods, ideas, or system contributions that make the work stand out.

prospective metacognitive control
resource-constrained planning
TRIAGE framework
token budget allocation
autonomous language agents
🔎 Similar Papers
No similar papers found.