Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from performance degradation and excessive computational overhead during complex reasoning due to “overthinking”—unnecessary continuation of chain-of-thought (CoT) generation beyond the point where reasoning is effectively complete. Method: We propose an early-exit mechanism based on identifying the Reasoning Completion Point (RCP), defined as the critical state where CoT naturally terminates. We formally partition the reasoning process into three phases—initiation, expansion, and convergence—and introduce a lightweight, heuristic thresholding strategy that dynamically detects RCPs by jointly modeling end-of-thought token probability, relative reasoning length change rate, and semantic stability. Contribution/Results: Our method requires no fine-tuning or auxiliary training. Evaluated on AIME24, AIME25, and GPQA-D benchmarks, it reduces average token consumption by 38.6% while maintaining or improving accuracy by 0.9–2.3 percentage points, significantly enhancing both inference efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) enhance complex reasoning tasks by scaling the individual thinking process. However, prior work shows that overthinking can degrade overall performance. Motivated by observed patterns in thinking length and content length, we categorize reasoning into three stages: insufficient exploration stage, compensatory reasoning stage, and reasoning convergence stage. Typically, LLMs produce correct answers in the compensatory reasoning stage, whereas reasoning convergence often triggers overthinking, causing increased resource usage or even infinite loops. Therefore, mitigating overthinking hinges on detecting the end of the compensatory reasoning stage, defined as the Reasoning Completion Point (RCP). RCP typically appears at the end of the first complete reasoning cycle and can be identified by querying the LLM sentence by sentence or monitoring the probability of an end-of-thinking token (e.g., exttt{</think>}), though these methods lack an efficient and precise balance. To improve this, we mine more sensitive and consistent RCP patterns and develop a lightweight thresholding strategy based on heuristic rules. Experimental evaluations on benchmarks (AIME24, AIME25, GPQA-D) demonstrate that the proposed method reduces token consumption while preserving or enhancing reasoning accuracy.
Problem

Research questions and friction points this paper is trying to address.

Mitigating LLM overthinking to reduce resource consumption
Identifying Reasoning Completion Point to prevent infinite loops
Balancing efficiency and accuracy in early reasoning exit
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining sensitive RCP patterns for early exit
Lightweight thresholding strategy with heuristic rules
Reduces token consumption while preserving accuracy
🔎 Similar Papers
No similar papers found.