Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

📅 2025-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often blindly rely on excessively long chain-of-thought (CoT) reasoning, leading to unnecessary computation for simple problems and insufficient resource allocation for complex ones. Method: This paper proposes a utility-maximization optimization framework constrained by learnable inference budgets. It integrates supervised fine-tuning (LLaMA3.1-8B Instruct), inference-budget-constrained policy optimization (IBPO), and utility-driven reinforcement learning—departing from conventional single-modality long-CoT paradigms. Contribution/Results: Crucially, it introduces the first learnable inference budget constraint enabling difficulty-aware, adaptive CoT length control. On MATH500, it achieves absolute accuracy gains of +4.14% and +5.74% (relative improvements of +8.08% and +11.2%) over baselines under 2.16× and 4.32× inference budgets, respectively—yielding approximately double the performance of self-consistency at equivalent budget levels.

Technology Category

Application Category

📝 Abstract
Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to ``understand'' the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a $4.14$% and $5.74$% absolute improvement ($8.08$% and $11.2$% relative improvement) on MATH500 using $2.16$x and $4.32$x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately $2$x those of self-consistency under the same budgets.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Mathematical Problem Solving
Adaptive Thinking Depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference Budget Policy Optimization
Adaptive Reasoning Depth
Enhanced Mathematical Problem Solving
🔎 Similar Papers
No similar papers found.