BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from high inference latency and deployment costs when scaling computational resources. Method: We propose a budget-aware controllable inference framework that introduces learnable control tokens to dynamically represent remaining computational budget, coupled with a two-stage training paradigm: (i) supervised fine-tuning to establish foundational control capability, followed by (ii) curriculum-based reinforcement learning with a length-aware reward function to enable fine-grained, dynamic regulation of chain-of-thought (CoT) length. Contribution/Results: This is the first inference control mechanism supporting flexible, user-specified computational budgets without significant accuracy degradation. Experiments on multiple mathematical reasoning benchmarks demonstrate substantial improvements over strong baselines; our method consistently achieves higher accuracy across diverse computational budgets, thereby enhancing LLM practicality and scalability in low-latency, cost-sensitive deployment scenarios.

Technology Category

Application Category

📝 Abstract
Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.
Problem

Research questions and friction points this paper is trying to address.

Controls LLM reasoning length with budget constraints
Reduces latency and resource costs in LLM inference
Optimizes accuracy while adhering to token budgets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Control tokens for budget-aware reasoning
Two-stage training with SFT and RL
Length-aware reward function optimization
🔎 Similar Papers
H
Hao Wen
Institute for AI Industry Research (AIR), Tsinghua University
X
Xinrui Wu
Institute for AI Industry Research (AIR), Tsinghua University
Y
Yi Sun
Institute for AI Industry Research (AIR), Tsinghua University
F
Feifei Zhang
Institute for AI Industry Research (AIR), Tsinghua University
L
Liye Chen
Institute for AI Industry Research (AIR), Tsinghua University
J
Jie Wang
Institute for AI Industry Research (AIR), Tsinghua University
Yunxin Liu
Yunxin Liu
IEEE Fellow, Guoqiang Professor, Institute for AI Industry Research (AIR), Tsinghua University
Mobile ComputingEdge ComputingAIoTSystemNetworking
Y
Ya-Qin Zhang
Institute for AI Industry Research (AIR), Tsinghua University
Yuanchun Li
Yuanchun Li
Institute for AI Industry Research (AIR), Tsinghua University
mobile computingartificial intelligence