TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limited prospective metacognitive control of large language models (LLMs) in resource-constrained settings, which hinders their ability to efficiently plan task selection, sequencing, and computational resource allocation. To tackle this gap, the authors introduce TRIAGE, a novel evaluation framework that incorporates human-inspired prospective metacognition into LLM assessment for the first time. TRIAGE establishes a unified triadic decision paradigm: given a task pool and a token budget, the model must generate a complete execution plan in a single pass, with scheduling efficiency evaluated via an Oracle-aligned scoring mechanism. Experiments across diverse task sets on state-of-the-art and open-source models reveal a pronounced deficiency in this capability, persisting even when explicit reasoning is enabled, thereby underscoring its critical importance for deploying efficient AI agents.

📝 Abstract

Deploying language models as autonomous agents requires more than per-task accuracy: when an agent faces a queue of problems under a finite token budget, it must decide which to attempt, in what order, and how much compute to commit to each, all before any execution feedback is available. This is the prospective form of metacognitive control studied for decades in human cognition, yet whether language models possess it remains untested. We introduce TRIAGE, an evaluation framework in which a model receives a task pool and a token budget calibrated to its own baseline cost, and commits to a single ordered plan that jointly encodes selection, sequencing, and per-problem allocation. Plans are scored against an oracle with full knowledge of the model's solvability and cost on each problem, yielding a triage efficiency ratio on a common scale. We evaluate frontier and open-source models, with and without reasoning enabled, across competition mathematics, graduate-level science, code generation, and expert multidisciplinary knowledge, and find that current language models exhibit substantial gaps in prospective metacognitive control, revealing a previously unmeasured capability dimension with direct implications for resource-efficient agent deployment.

Problem

Research questions and friction points this paper is trying to address.

prospective metacognitive control

resource constraints

language models

autonomous agents

task triage

Innovation

Methods, ideas, or system contributions that make the work stand out.

prospective metacognitive control

resource-constrained planning

TRIAGE framework