๐ค AI Summary
This work addresses the inefficiency of large language models in reasoning, which often overcompute on simple questions while undercomputing on difficult ones, thereby failing to balance efficiency and performance. The authors propose a difficulty-aware computation allocation mechanism that requires neither external annotations nor user-specified budgets. By leveraging internal signals, the method dynamically adjusts reasoning depth: it estimates problem difficulty via ensemble rollback, employs a dual non-negative gating mechanism to modulate the reward function, and introduces a length-dependent shaping term to optimize the reasoning path. Experiments across multiple models and benchmarks demonstrate significant efficiency gainsโreducing token consumption by over 60% on simple tasks without accuracy loss, while enhancing performance on complex tasks through deeper reasoning.
๐ Abstract
The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often falls into another trap: overthinking simple problems, where repetitive rationales yield minimal accuracy gains at a disproportionately high cost. This motivates adaptive reasoning: dynamically aligning reasoning depth with instance difficulty. In this paper, we study adaptive reasoning from an optimality perspective, formalizing it as a utility maximization problem where tokens are allocated until the marginal accuracy gain falls below the incremental cost. Based on this, we propose CODA (Compute Allocation by Difficulty Awareness), a method that operationalizes this principle by allocating tokens via a policy-internal difficulty signal. Specifically, CODA estimates difficulty via group-based rollouts and maps it to two non-negative gates that modulate a length-dependent shaping term on top of the binary base reward. The easy-side gate penalizes verbosity on simple instances, whereas the hard-side gate encourages more deliberative rollouts on challenging ones. Across model scales and benchmarks, CODA achieves adaptive reasoning without external annotations or user-provided budgets: on easy tasks, CODA reduces token costs by over 60% while maintaining strong accuracy, whereas on hard tasks it incentivizes more deliberative rollouts to maximize performance.