BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large language models (LLMs) suffer from high inference latency and deployment costs when scaling computational resources. Method: We propose a budget-aware controllable inference framework that introduces learnable control tokens to dynamically represent remaining computational budget, coupled with a two-stage training paradigm: (i) supervised fine-tuning to establish foundational control capability, followed by (ii) curriculum-based reinforcement learning with a length-aware reward function to enable fine-grained, dynamic regulation of chain-of-thought (CoT) length. Contribution/Results: This is the first inference control mechanism supporting flexible, user-specified computational budgets without significant accuracy degradation. Experiments on multiple mathematical reasoning benchmarks demonstrate substantial improvements over strong baselines; our method consistently achieves higher accuracy across diverse computational budgets, thereby enhancing LLM practicality and scalability in low-latency, cost-sensitive deployment scenarios.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.

Problem

Research questions and friction points this paper is trying to address.

Controls LLM reasoning length with budget constraints

Reduces latency and resource costs in LLM inference

Optimizes accuracy while adhering to token budgets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Control tokens for budget-aware reasoning

Two-stage training with SFT and RL

Length-aware reward function optimization

🔎 Similar Papers

Rational Metareasoning for Large Language Models