Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether scaling computational resources during inference can spontaneously elicit resource-rational behavior in large language models—specifically, the ability to adaptively adjust reasoning strategies according to task complexity without explicit rewards for computational efficiency. Using a variable attribution task, the authors systematically modulate complexity by controlling the number of candidate variables and trials, and compare instruction-tuned (IT) models against reinforcement learning–trained large reasoning models (LRMs) across various logical functions such as XOR and XNOR. The work presents the first evidence that inference-time scaling alone can shift model behavior from exhaustive search toward analytical reasoning, demonstrating emergent resource rationality without explicit cost constraints. Notably, LRMs maintain robust performance under high complexity, whereas IT models exhibit significant degradation, underscoring the emergent nature of this capability.

Technology Category

Application Category

📝 Abstract
Human reasoning is shaped by resource rationality -- optimizing performance under constraints. Recently, inference-time scaling has emerged as a powerful paradigm to improve the reasoning performance of Large Language Models by expanding test-time computation. Specifically, instruction-tuned (IT) models explicitly generate long reasoning steps during inference, whereas Large Reasoning Models (LRMs) are trained by reinforcement learning to discover reasoning paths that maximize accuracy. However, it remains unclear whether resource-rationality can emerge from such scaling without explicit reward related to computational costs. We introduce a Variable Attribution Task in which models infer which variables determine outcomes given candidate variables, input-output trials, and predefined logical functions. By varying the number of candidate variables and trials, we systematically manipulate task complexity. Both models exhibit a transition from brute-force to analytic strategies as complexity increases. IT models degrade on XOR and XNOR functions, whereas LRMs remain robust. These findings suggest that models can adjust their reasoning behavior in response to task complexity, even without explicit cost-based reward. It provides compelling evidence that resource rationality is an emergent property of inference-time scaling itself.
Problem

Research questions and friction points this paper is trying to address.

resource rationality
inference-time scaling
reasoning strategies
task complexity
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

inference-time scaling
resource rationality
Large Reasoning Models
emergent behavior
reasoning strategies
🔎 Similar Papers
No similar papers found.