LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the inefficiency of large language models that often generate excessively long chains of thought during reasoning, leading to unnecessary computational and contextual overhead, while existing approaches struggle to dynamically balance correctness and efficiency. The authors propose LEAD, a method that employs an online adaptive mechanism to dynamically adjust the trade-off between correctness and efficiency at each reasoning step. LEAD estimates problem-specific target lengths from the model’s own correct reasoning trajectories and applies symmetric efficiency rewards. It further introduces Potential-Scaled Instability to dynamically calibrate learning signals, overcoming limitations of static rewards and global length constraints. Experiments across five mathematical reasoning benchmarks demonstrate that LEAD achieves state-of-the-art accuracy-efficiency trade-offs, significantly reducing output length while maintaining the highest accuracy.

📝 Abstract

Large reasoning models, such as OpenAI o1 and DeepSeek-R1, tend to become increasingly verbose as their reasoning capabilities improve. These inflated Chain-of-Thought (CoT) trajectories often exceed what the underlying problems require, wasting compute, latency, and context budgets. While introducing length-based efficiency rewards during reinforcement learning offers a natural remedy, existing methods struggle with two fundamental challenges: the optimal balance between correctness and efficiency is non-stationary throughout training, and intrinsic reasoning budgets vary drastically across problems. Relying on static reward weights and global length constraints inevitably forces a compromise between degraded accuracy and unrealized compression. To overcome these limitations, we propose LEAD (Length-Efficient Adaptive and Dynamic reasoning), a method that replaces static heuristics with online, self-adaptive mechanisms. LEAD dynamically calibrates the correctness-efficiency trade-off at each step using a Potential-Scaled Instability, directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model's own correct rollouts, applying a symmetric efficiency reward that penalizes both overthinking and over-compression. Evaluated on five mathematical reasoning benchmarks, LEAD achieves the highest accuracy and Accuracy-Efficiency Score among RL-trained efficient-reasoning methods while producing substantially shorter outputs than the base model.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought

reasoning efficiency

length optimization

large language models

adaptive reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive reasoning

length-efficient LLMs

dynamic reward calibration