SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the “underthinking” problem in large language models (LLMs) during long-chain-of-thought reasoning—caused by frequent, premature shallow-path switching—this paper proposes SmartSwitch, a novel inference framework. Its core innovation is a plug-and-play perception–intervention mechanism: leveraging off-the-shelf process reward models to dynamically detect premature path switching, then applying real-time intervention via backtracking and depth-aware prompt injection to guide deeper exploration of high-potential reasoning paths. SmartSwitch requires no model fine-tuning and is compatible with LLMs of diverse scales. Evaluated on multiple mathematical reasoning benchmarks (e.g., GSM8K, MATH), it achieves substantial improvements in accuracy (+3.2–5.7 percentage points) and token efficiency (18–25% reduction in reasoning steps), demonstrating its effectiveness, generalizability, and deployment friendliness.

Technology Category

Application Category

📝 Abstract

The long chain-of-thought (LongCoT) capability is central to the recent breakthroughs achieved by large language models in complex reasoning tasks. However, the accompanying issue of''underthinking'', where models exhibit shallow reasoning by frequently switching thoughts without sufficient exploration, limits both performance and token efficiency. To address this problem, we propose a simple yet effective reasoning strategy: the SmartSwitch inference framework. This framework can be easily integrated into any large language model as a plug-and-play solution, continuously monitoring the model's reasoning process to detect underthinking and guide it toward deeper exploration of promising but overlooked thoughts. Specifically, the perception module identifies points where thoughts switch and evaluates the potential of the preceding thought using an off-the-shelf process reward model (PRM). If a high-potential thought is found to be prematurely abandoned, the intervention module interrupts the ongoing inference, backtracks to the point before the switch, and inserts a"deepening prompt"to encourage further exploration along that promising path. Extensive experiments on challenging mathematical reasoning benchmarks demonstrate that our method significantly enhances the performance of various large language models of different sizes.

Problem

Research questions and friction points this paper is trying to address.

Addresses shallow reasoning in large language models

Overcomes premature thought switching during inference

Enhances exploration of promising reasoning paths

Innovation

Methods, ideas, or system contributions that make the work stand out.

SmartSwitch monitors reasoning for underthinking detection

Backtracks to insert deepening prompts for overlooked thoughts

Uses process reward model to evaluate thought potential

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting