π€ AI Summary
Large language models often fall into suboptimal or redundant reasoning paths due to a lack of foresight. This work proposes a heuristic searchβbased neural chain-of-thought framework that formulates reasoning as a dynamic exploration of sparse, high-quality thought trajectories. The approach introduces a dual-factor heuristic strategy that jointly optimizes accuracy and computational cost to actively evaluate and select superior reasoning operators. Empirical results demonstrate that the method achieves an average accuracy improvement of over 3.5% across multiple reasoning benchmarks while reducing reasoning generation length by more than 22%, thereby attaining a Pareto improvement in both efficiency and performance.
π Abstract
Chain-of-Thought reasoning has significantly enhanced the problem-solving capabilities of Large Language Models. Unfortunately, current models generate reasoning steps sequentially without foresight, often becoming trapped in suboptimal reasoning paths with redundant steps. In contrast, we introduce Neural Chain-of-Thought Search (NCoTS), a framework that reformulates reasoning as a dynamic search for the optimal thinking strategy. By quantitatively characterizing the solution space, we reveal the existence of sparse superior reasoning paths that are simultaneously more accurate and concise than standard outputs. Our method actively navigates towards these paths by evaluating candidate reasoning operators using a dual-factor heuristic that optimizes for both correctness and computational cost. Consequently, NCoTS achieves a Pareto improvement across diverse reasoning benchmarks, boosting accuracy by over 3.5% while reducing generation length by over 22%. Our code and data are available at https://github.com/MilkThink-Lab/Neural-CoT-Search.