🤖 AI Summary
To address reasoning redundancy caused by “overthinking” in large language models, this paper proposes a concise reasoning framework based on Lagrangian optimization, formalizing intermediate step minimization as a length-optimization problem under performance constraints. The core innovation is the Performance-Aware Length Update (PALU) algorithm, which jointly integrates off-policy rollout-based performance estimation, truncated Lagrangian multiplier updates, and quantile-driven dynamic length adjustment—ensuring both theoretical rigor and engineering practicality. Unlike heuristic-based approaches, PALU requires no hand-crafted rules and demonstrates strong generalization across diverse tasks (logical reasoning, STEM, and mathematics) and model scales (1.5B–14B parameters). Evaluated on five benchmarks, the method achieves an average 65% reduction in output token length while improving accuracy by 15%.
📝 Abstract
Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by introducing a principled and pragmatic strategy, performance-aware length updating (PALU). As a principled algorithm, PALU formulates concise reasoning as a constrained optimization problem, minimizing response length subject to a performance constraint, and then applies Lagrangian optimization to convert it into a tractable unconstrained problem. As a pragmatic solution, PALU streamlines complicated update rules through three approximations: (i) estimating performance with off-policy rollouts, (ii) truncating the Lagrange multiplier to two extremes, and (iii) replacing gradient-based updates with quantile-driven length adjustments. PALU reduces output length by 65% while improving accuracy by 15% when applied to DeepSeek-Distill-Qwen-1.5B, averaged over five benchmarks, outperforming a range of alternative methods. Furthermore, PALU is demonstrated to adapt across both domain (logic, STEM and math) and model scale (1.5B, 7B, 14B) entrenching the algorithm as a practical and effective concise reasoning approach.