🤖 AI Summary
To address the “underthinking” problem in large language models (LLMs) during long-chain-of-thought reasoning—caused by frequent, premature shallow-path switching—this paper proposes SmartSwitch, a novel inference framework. Its core innovation is a plug-and-play perception–intervention mechanism: leveraging off-the-shelf process reward models to dynamically detect premature path switching, then applying real-time intervention via backtracking and depth-aware prompt injection to guide deeper exploration of high-potential reasoning paths. SmartSwitch requires no model fine-tuning and is compatible with LLMs of diverse scales. Evaluated on multiple mathematical reasoning benchmarks (e.g., GSM8K, MATH), it achieves substantial improvements in accuracy (+3.2–5.7 percentage points) and token efficiency (18–25% reduction in reasoning steps), demonstrating its effectiveness, generalizability, and deployment friendliness.
📝 Abstract
The long chain-of-thought (LongCoT) capability is central to the recent breakthroughs achieved by large language models in complex reasoning tasks. However, the accompanying issue of''underthinking'', where models exhibit shallow reasoning by frequently switching thoughts without sufficient exploration, limits both performance and token efficiency. To address this problem, we propose a simple yet effective reasoning strategy: the SmartSwitch inference framework. This framework can be easily integrated into any large language model as a plug-and-play solution, continuously monitoring the model's reasoning process to detect underthinking and guide it toward deeper exploration of promising but overlooked thoughts. Specifically, the perception module identifies points where thoughts switch and evaluates the potential of the preceding thought using an off-the-shelf process reward model (PRM). If a high-potential thought is found to be prematurely abandoned, the intervention module interrupts the ongoing inference, backtracks to the point before the switch, and inserts a"deepening prompt"to encourage further exploration along that promising path. Extensive experiments on challenging mathematical reasoning benchmarks demonstrate that our method significantly enhances the performance of various large language models of different sizes.