Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance

πŸ“… 2024-10-03
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 4
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Language models still suffer from low search efficiency and poor trajectory quality in complex reasoning tasks. To address this, we propose Guided Search over Solutions (GSoS), a novel framework thatβ€” for the first timeβ€”uses the optimal solution as a dynamic, stepwise landmark to guide the model in autonomously generating high-quality, low-noise search trajectories; knowledge is distilled via supervised fine-tuning, and further refined through reinforcement learning (RL) co-optimization. GSoS breaks from prior paradigms reliant on suboptimal search processes or static reward signals. Evaluated on the Countdown mathematical reasoning benchmark, GSoS achieves significant performance gains: the joint supervised + RL training outperforms pure supervised methods and surpasses subgoal-based reward mechanisms, empirically validating that optimal-solution-guided trajectory signals fundamentally enhance search planning capability.

Technology Category

Application Category

πŸ“ Abstract
While language models have demonstrated impressive capabilities across a range of tasks, they still struggle with tasks that require complex planning and reasoning. Recent studies have proposed training language models on search processes rather than optimal solutions, resulting in better generalization performance even though search processes are noisy and even suboptimal. However, these studies overlook the value of optimal solutions, which can serve as step-by-step landmarks to guide more effective search. In this work, we explore how to leverage optimal solutions to enhance the search and planning abilities of language models. To this end, we propose guided stream of search (GSoS), which seamlessly incorporates optimal solutions into the self-generation process in a progressive manner, producing high-quality search trajectories. These trajectories are then distilled into the pre-trained model via supervised fine-tuning. Our approach significantly enhances the search and planning abilities of language models on Countdown, a simple yet challenging mathematical reasoning task. Notably, combining our method with RL fine-tuning yields further improvements, whereas previous supervised fine-tuning methods do not benefit from RL. Furthermore, our approach exhibits greater effectiveness than leveraging optimal solutions in the form of subgoal rewards.
Problem

Research questions and friction points this paper is trying to address.

Improving language model search efficiency in reasoning tasks
Addressing suboptimal search traces through guided self-training
Enhancing model performance on arithmetic and code repair problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning with guided reinforced self-training algorithm
Incorporating optimal solutions as search landmarks
Generating high-quality search traces for distillation
πŸ”Ž Similar Papers
No similar papers found.