e1: Learning Adaptive Control of Reasoning Effort

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge in AI inference of pre-specifying fixed resource budgets while balancing problem difficulty and user preferences, this paper proposes Adaptive Effort Control (AEC), a reinforcement learning–based method that jointly optimizes chain-of-thought reasoning and dynamic token budget allocation. AEC performs relative resource allocation based on the current average inference length, eliminating the need for prior knowledge of task difficulty. It achieves, for the first time, generalizable and continuously controllable inference across datasets and training stages. Evaluated on models ranging from 1.5B to 32B parameters, AEC compresses average inference length by approximately 3× while matching or exceeding baseline accuracy. This yields substantial improvements in the accuracy–cost trade-off curve. The core innovation lies in decoupling difficulty estimation from budget assignment, enabling users to flexibly adjust inference quality, latency, and computational cost via a single continuous control parameter at inference time.

Technology Category

Application Category

📝 Abstract
Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost. To leverage this tradeoff effectively, users need fine-grained control over the amount of thinking used for a particular query, but few approaches enable such control. Existing methods require users to specify the absolute number of desired tokens, but this requires knowing the difficulty of the problem beforehand to appropriately set the token budget for a query. To address these issues, we propose Adaptive Effort Control, a self-adaptive reinforcement learning method that trains models to use a user-specified fraction of tokens relative to the current average chain-of-thought length for each query. This approach eliminates dataset- and phase-specific tuning while producing better cost-accuracy tradeoff curves compared to standard methods. Users can dynamically adjust the cost-accuracy trade-off through a continuous effort parameter specified at inference time. We observe that the model automatically learns to allocate resources proportionally to the task difficulty and, across model scales ranging from 1.5B to 32B parameters, our approach enables approximately 3x reduction in chain-of-thought length while maintaining or improving performance relative to the base model used for RL training.
Problem

Research questions and friction points this paper is trying to address.

Enabling fine-grained user control over AI reasoning effort allocation
Eliminating need for dataset-specific tuning of token budgets
Automatically adapting computation to task difficulty for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive reinforcement learning controls reasoning effort
User-specified fraction of tokens relative to chain-of-thought
Dynamic cost-accuracy trade-off adjustment through effort parameter