🤖 AI Summary
This work addresses the challenge that large language models struggle to anticipate the computational demands of tasks under strict global token budgets, leading to inefficient allocation of inference resources. To tackle this, the authors propose ROI-Reasoning, a novel framework that formulates budget-constrained reasoning as an ordered stochastic multiple-choice knapsack problem. By integrating an intrinsic metacognitive mechanism—trained via metacognitive fine-tuning and rationality-aware reinforcement learning—the model learns to estimate task difficulty and expected utility prior to inference, enabling strategic computation allocation. Empirical evaluations demonstrate that this approach significantly enhances overall performance across multiple mathematical reasoning benchmarks and substantially reduces decision regret under tight computational budgets.
📝 Abstract
Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint and formalize it as a Ordered Stochastic Multiple-Choice Knapsack Problem(OS-MCKP). This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment (ROI), and allocating computation strategically. We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality. In the first stage, Meta-Cognitive Fine-Tuning teaches models to predict reasoning cost and expected utility before generation, enabling explicit solve-or-skip decisions. Next, Rationality-Aware Reinforcement Learning optimizes sequential decision making under a hard token budget, allowing models to learn long-horizon allocation strategies. Across budgeted mathematical reasoning benchmarks, ROI-Reasoning consistently improves overall score while substantially reducing regret under tight computation budgets.