Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing search agents suffer from low recall and constrained accuracy in multi-hop question answering due to insufficient reasoning and retrieval capabilities, as well as poor coordination across multiple subtasks. To address this, we propose ReQ-RL, a reinforcement learning–based query expansion framework that concurrently generates diverse query variants to enhance retrieval recall. It incorporates a lightweight pretrained “squeezer” model to compress and comprehend retrieved documents, thereby decoupling query generation from evidence understanding—allowing a 3B-parameter LLM to focus exclusively on optimizing query policies. This design marks the first instance where a small-scale model achieves state-of-the-art (SOTA) performance on multi-hop QA, outperforming prior best methods by an average of 4.4% across seven benchmarks, with particularly pronounced gains on tasks requiring integration of evidence from multiple sources.

Technology Category

Application Category

📝 Abstract
Reasoning-augmented search agents, such as Search-R1, are trained to reason, search, and generate the final answer iteratively. Nevertheless, due to their limited capabilities in reasoning and search, their performance on multi-hop QA benchmarks remains far from satisfactory. To handle complex or compound queries, we train an LLM-based search agent with the native capability of query expansion through reinforcement learning. In each turn, our search agent proposes several query variants, which are searched simultaneously to cover more relevant information. Meanwhile, given limited post-training data and computing resources, it is very challenging for a search agent to master multiple tasks, including query generation, retrieved information understanding, and answer generation. Therefore, we propose incorporating a pre-trained squeezer model that helps the search agent understand the retrieved documents, allowing the search agent to focus on query generation for high retrieval recall. With the assistance of the squeezer model, we discover that even a small-scale 3B LLM can demonstrate a strong capability of query expansion and achieve state-of-the-art accuracy on the multi-hop QA benchmarks. To be specific, our experiments across seven question-answering benchmarks demonstrate that our method, named ExpandSearch, achieves an average improvement of 4.4% compared to state-of-the-art baselines, with strong gains on multi-hop reasoning tasks requiring diverse evidence aggregation.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-hop QA via reinforcement learning query expansion
Overcoming limited reasoning in search agents for complex queries
Improving retrieval recall with specialized squeezer model assistance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains LLM for query expansion using reinforcement learning
Uses pre-trained squeezer model to understand retrieved documents
Simultaneously searches multiple query variants for better coverage
🔎 Similar Papers
No similar papers found.