π€ AI Summary
To address resource waste and response latency in large language model (LLM) inference caused by fixed decoding lengths, this paper proposes an adaptive token budget control framework. Methodologically, it introduces: (1) a dynamic cost estimation mechanism grounded in query difficulty; (2) a novel two-stage training paradigm coupled with Budget-Guided GPROβa reinforcement learning algorithm enabling user-specified token ceilings, real-time generation interruption, and predictable decoding latency; and (3) an integrated strategy combining controllable decoding with dynamic token scheduling. Evaluated on the MATH benchmark, the framework achieves up to 74.47% reduction in response length while incurring less than a 0.3% accuracy drop. This yields substantial improvements in inference efficiency and user experience, demonstrating both computational savings and robust task performance under stringent budget constraints.
π Abstract
Recently, large reasoning models demonstrate exceptional performance on various tasks. However, reasoning models inefficiently over-process both trivial and complex queries, leading to resource waste and prolonged user latency. To address this challenge, we propose SelfBudgeter - a self-adaptive controllable reasoning strategy for efficient reasoning. Our approach adopts a dual-phase training paradigm: first, the model learns to pre-estimate the reasoning cost based on the difficulty of the query. Then, we introduce budget-guided GPRO for reinforcement learning, which effectively maintains accuracy while reducing output length. SelfBudgeter allows users to anticipate generation time and make informed decisions about continuing or interrupting the process. Furthermore, our method enables direct manipulation of reasoning length via pre-filling token budget. Experimental results demonstrate that SelfBudgeter can rationally allocate budgets according to problem complexity, achieving up to 74.47% response length compression on the MATH benchmark while maintaining nearly undiminished accuracy.