SmartSearch: Process Reward-Guided Query Refinement for Search Agents

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the limitations of existing large language model–based search agents, whose retrieval accuracy and reasoning performance are often compromised by the generation of low-quality intermediate queries. To mitigate this issue, the paper proposes a process reward–driven dual-level credit assignment mechanism coupled with a selective query optimization strategy. Furthermore, a three-stage curriculum learning framework—progressing from imitation to generalization—is designed to guide the agent in internalizing the ability to generate high-quality queries. Empirical evaluations demonstrate that the proposed approach significantly outperforms current state-of-the-art methods across multiple benchmarks, achieving notable improvements in both query quality and search efficiency.

Technology Category

Application Category

📝 Abstract

Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents'overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.

Problem

Research questions and friction points this paper is trying to address.

search agents

query quality

large language models

information retrieval

knowledge-intensive tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process Reward

Query Refinement

Search Agents