🤖 AI Summary
Large reasoning models often suffer from inefficiency due to excessive reasoning steps, struggling to balance response length and accuracy. To address this, we propose a difficulty-aware adaptive reasoning framework. First, we construct a problem difficulty self-assessment model. Second, we design a two-stage training paradigm: cold-start supervised fine-tuning followed by difficulty-aware reinforcement learning. Third, we introduce a length-triggered labeling mechanism, enabling users to explicitly and granularly control inference budget via natural language instructions. Evaluated on AIME2024/2025, our method reduces average response length by 10.06%/12.14% while improving accuracy. On MATH500 and GSM8K, it compresses reasoning steps by 62.05% and 91.04%, respectively, without performance degradation. Our key contribution is the first unified framework integrating difficulty modeling, dynamic step allocation, and natural-language-level budget specification for inference optimization.
📝 Abstract
Modern large reasoning models demonstrate impressive problem-solving capabilities by employing sophisticated reasoning strategies. However, they often struggle to balance efficiency and effectiveness, frequently generating unnecessarily lengthy reasoning chains for simple problems. In this work, we propose AdaCtrl, a novel framework to support both difficulty-aware adaptive reasoning budget allocation and explicit user control over reasoning depth. AdaCtrl dynamically adjusts its reasoning length based on self-assessed problem difficulty, while also allowing users to manually control the budget to prioritize either efficiency or effectiveness. This is achieved through a two-stage training pipeline: an initial cold-start fine-tuning phase to instill the ability to self-aware difficulty and adjust reasoning budget, followed by a difficulty-aware reinforcement learning (RL) stage that refines the model's adaptive reasoning strategies and calibrates its difficulty assessments based on its evolving capabilities during online training. To enable intuitive user interaction, we design explicit length-triggered tags that function as a natural interface for budget control. Empirical results show that AdaCtrl adapts reasoning length based on estimated difficulty, compared to the standard training baseline that also incorporates fine-tuning and RL, it yields performance improvements and simultaneously reduces response length by 10.06% and 12.14% on the more challenging AIME2024 and AIME2025 datasets, which require elaborate reasoning, and by 62.05% and 91.04% on the MATH500 and GSM8K datasets, where more concise responses are sufficient. Furthermore, AdaCtrl enables precise user control over the reasoning budget, allowing for tailored responses to meet specific needs.