Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning

๐Ÿ“… 2026-04-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the inefficiency in multi-turn reasoning caused by static computation allocation, which often leads to overthinking on simple steps and suboptimal overall performance. The authors formulate this challenge as a sequential computation allocation task and propose Turn-Adaptive Budgets (TAB), a reinforcement learningโ€“based strategy that dynamically allocates token budgets for each reasoning turn. Their approach introduces an adaptive budgeting mechanism that leverages both dialogue history and sub-question planning, optimized through a multi-objective Markov decision process. Two variants are supported: one using only conversation history (TAB) and another incorporating full sub-question planning (TAB All-SubQ). Evaluated on mathematical reasoning benchmarks, the method achieves comparable accuracy while reducing token usage by 35%โ€“40%, significantly outperforming existing static and dynamic budgeting baselines.
๐Ÿ“ Abstract
As LLM reasoning performance plateau, improving inference-time compute efficiency is crucial to mitigate overthinking and long thinking traces even for simple queries. Prior approaches including length regularization, adaptive routing, and difficulty-based budget allocation primarily focus on single-turn settings and fail to address the sequential dependencies inherent in multi-turn reasoning.In this work, we formulate multi-turn reasoning as a sequential compute allocation problem and model it as a multi-objective Markov Decision Process. We propose TAB: Turn-Adaptive Budgets, a budget allocation policy trained via Group Relative Policy Optimization (GRPO) that learns to maximize task accuracy while respecting global per-problem token constraints. Consequently, TAB takes as input the conversation history and learns to adaptively allocate smaller budgets to easier turns and save appropriate number of tokens for the crucial harder reasoning steps. Our experiments on mathematical reasoning benchmarks demonstrate that TAB achieves a superior accuracy-tokens tradeoff saving up to 35% tokens while maintaining accuracy over static and off-the-shelf LLM budget baselines. Further, for systems where a plan of all sub-questions is available apriori, we propose TAB All-SubQ, a budget allocation policy that budgets tokens based on the conversation history and all past and future sub-questions saving up to 40% tokens over baselines.
Problem

Research questions and friction points this paper is trying to address.

multi-turn reasoning
compute efficiency
sequential dependencies
token budget allocation
overthinking
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive budget allocation
multi-turn reasoning
Markov Decision Process
token efficiency
Group Relative Policy Optimization
๐Ÿ”Ž Similar Papers