Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently exhibit “over-reasoning”—redundant Chain-of-Thought (CoT) steps—on simple problems, leading to inefficiency and lacking adaptive depth control in reasoning. Method: This paper proposes a dynamic reasoning paradigm of “explore first, then decide,” centered on two novel mechanisms: Token-wise Entropy Cumulative Average (TECA) and Cumulative Entropy Regulation (CER). These monitor real-time entropy evolution in hidden states during inference to autonomously identify the optimal stopping point. The approach requires no additional training or fine-tuning and integrates seamlessly into existing CoT frameworks. Contribution/Results: Experiments across multiple mathematical reasoning benchmarks show an average 71% reduction in response length without sacrificing accuracy, demonstrating substantial improvements in both inference efficiency and adaptive reasoning capability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning. However, they often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems. This issue may degrade the efficiency of the models and make them difficult to adapt the reasoning depth to the complexity of problems. To address this, we introduce a novel metric Token Entropy Cumulative Average (TECA), which measures the extent of exploration throughout the reasoning process. We further propose a novel reasoning paradigm -- Explore Briefly, Then Decide -- with an associated Cumulative Entropy Regulation (CER) mechanism. This paradigm leverages TECA to help the model dynamically determine the optimal point to conclude its thought process and provide a final answer, thus achieving efficient reasoning. Experimental results across diverse mathematical benchmarks show that our approach substantially mitigates overthinking without sacrificing problem-solving ability. With our thinking paradigm, the average response length decreases by up to 71% on simpler datasets, demonstrating the effectiveness of our method in creating a more efficient and adaptive reasoning process.
Problem

Research questions and friction points this paper is trying to address.

Mitigating LLM overthinking in reasoning processes
Dynamically regulating reasoning depth via entropy metrics
Reducing unnecessary lengthy responses for simpler problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

TECA metric measures reasoning exploration extent
CER mechanism dynamically concludes thought process
Explore Briefly Then Decide paradigm reduces overthinking
🔎 Similar Papers
T
Tianyi Jiang
Tongji University
Yi Bin
Yi Bin
National University of Singapore
multimediavision and languagedeep learning
Yujuan Ding
Yujuan Ding
The Hong Kong Polytechnic University
Computational FashionRecommendationInformation Retrieval
K
Kainian Zhu
Shanghai University of Electric and Power
F
Fei Ma
Guangdong Laboratory of Artificial Intelligence and Digital Economy
J
Jingkuan Song
Tongji University
H
Heng Tao Shen
Tongji University