Concise Reasoning in the Lens of Lagrangian Optimization

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address reasoning redundancy caused by “overthinking” in large language models, this paper proposes a concise reasoning framework based on Lagrangian optimization, formalizing intermediate step minimization as a length-optimization problem under performance constraints. The core innovation is the Performance-Aware Length Update (PALU) algorithm, which jointly integrates off-policy rollout-based performance estimation, truncated Lagrangian multiplier updates, and quantile-driven dynamic length adjustment—ensuring both theoretical rigor and engineering practicality. Unlike heuristic-based approaches, PALU requires no hand-crafted rules and demonstrates strong generalization across diverse tasks (logical reasoning, STEM, and mathematics) and model scales (1.5B–14B parameters). Evaluated on five benchmarks, the method achieves an average 65% reduction in output token length while improving accuracy by 15%.

Technology Category

Application Category

📝 Abstract
Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by introducing a principled and pragmatic strategy, performance-aware length updating (PALU). As a principled algorithm, PALU formulates concise reasoning as a constrained optimization problem, minimizing response length subject to a performance constraint, and then applies Lagrangian optimization to convert it into a tractable unconstrained problem. As a pragmatic solution, PALU streamlines complicated update rules through three approximations: (i) estimating performance with off-policy rollouts, (ii) truncating the Lagrange multiplier to two extremes, and (iii) replacing gradient-based updates with quantile-driven length adjustments. PALU reduces output length by 65% while improving accuracy by 15% when applied to DeepSeek-Distill-Qwen-1.5B, averaged over five benchmarks, outperforming a range of alternative methods. Furthermore, PALU is demonstrated to adapt across both domain (logic, STEM and math) and model scale (1.5B, 7B, 14B) entrenching the algorithm as a practical and effective concise reasoning approach.
Problem

Research questions and friction points this paper is trying to address.

Optimizing concise reasoning to minimize output length
Balancing conciseness with performance across domains
Adapting reasoning methods to different model scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

Formulates concise reasoning as constrained optimization problem
Applies Lagrangian optimization for tractable unconstrained solution
Uses three approximations to streamline update rules