ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel with Reinforcement Learning

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing reinforcement learning (RL)-based search agents enforce sequential querying for multi-step retrieval tasks, failing to exploit logical independence among subtasks and thus suffering from low computational efficiency. Method: We propose RLVR-Parallel—the first RL framework supporting parallel search—comprising (i) query structure identification and decomposition to automatically discover parallelizable subqueries; (ii) a composite reward mechanism jointly optimizing retrieval efficiency and quality while preserving answer accuracy; and (iii) integration of verifiable rewards, parallel scheduling, and joint reward modeling. Results: Evaluated on seven QA benchmarks, RLVR-Parallel achieves an average performance gain of 2.9%, with up to 12.7% improvement on parallelizable questions, while reducing LLM invocation counts to 69.6% of the sequential baseline—significantly alleviating the sequential execution bottleneck.

Technology Category

Application Category

📝 Abstract

Reasoning-augmented search agents such as Search-R1, trained via reinforcement learning with verifiable rewards (RLVR), demonstrate remarkable capabilities in multi-step information retrieval from external knowledge sources. These agents address the limitations of their parametric memory by dynamically gathering relevant facts to address complex reasoning tasks. However, existing approaches suffer from a fundamental architectural limitation: they process search queries strictly sequentially, even when handling inherently parallelizable and logically independent comparisons. This sequential bottleneck significantly constrains computational efficiency, particularly for queries that require multiple entity comparisons. To address this critical limitation, we propose ParallelSearch, a novel reinforcement learning framework that empowers large language models (LLMs) to recognize parallelizable query structures and execute multiple search operations concurrently. Our approach introduces dedicated reward functions that incentivize the identification of independent query components while preserving answer accuracy through jointly considering correctness, query decomposition quality, and parallel execution benefits. Comprehensive experiments demonstrate that ParallelSearch outperforms state-of-the-art baselines by an average performance gain of 2.9% across seven question-answering benchmarks. Notably, on parallelizable questions, our method achieves a 12.7% performance improvement while requiring only 69.6% of the LLM calls compared to sequential approaches.

Problem

Research questions and friction points this paper is trying to address.

Enables parallel search in LLMs for efficiency

Improves multi-step information retrieval via RL

Reduces LLM calls while boosting accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

ParallelSearch enables parallel query decomposition via RL

Reward functions incentivize independent query components

Achieves higher efficiency with fewer LLM calls

🔎 Similar Papers

Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search