🤖 AI Summary
Large language models (LLMs) suffer from high inference latency and deployment costs when scaling computational resources.
Method: We propose a budget-aware controllable inference framework that introduces learnable control tokens to dynamically represent remaining computational budget, coupled with a two-stage training paradigm: (i) supervised fine-tuning to establish foundational control capability, followed by (ii) curriculum-based reinforcement learning with a length-aware reward function to enable fine-grained, dynamic regulation of chain-of-thought (CoT) length.
Contribution/Results: This is the first inference control mechanism supporting flexible, user-specified computational budgets without significant accuracy degradation. Experiments on multiple mathematical reasoning benchmarks demonstrate substantial improvements over strong baselines; our method consistently achieves higher accuracy across diverse computational budgets, thereby enhancing LLM practicality and scalability in low-latency, cost-sensitive deployment scenarios.
📝 Abstract
Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.