Rethinking On-policy Optimization for Query Augmentation

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work investigates the performance boundary between lightweight prompt-based and high-cost reinforcement learning (RL)-based query expansion methods. To address information retrieval tasks, we propose On-policy Pseudo-document Query Expansion (OPQE), a hybrid framework that synergizes the flexibility of prompt engineering with the goal-directed optimization capability of RL: a policy model generates pseudo-documents explicitly optimized for retrieval effectiveness, without requiring additional annotations or supervised fine-tuning. Experiments across multiple retrieval benchmarks demonstrate that: (1) simple, training-free prompting methods can match or even surpass RL-based query rewriting in certain scenarios; (2) OPQE unifies the strengths of both paradigms, significantly outperforming pure-prompt and pure-RL baselines on mainstream datasets, achieving average MRR@10 gains of 3.2–5.8 percentage points. This study establishes a new paradigm for efficient, scalable, LLM-driven query expansion.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or pseudo-documents that serve as new queries, relying purely on the model's parametric knowledge or contextual information. The second applies reinforcement learning (RL) to fine-tune LLMs for query rewriting, directly optimizing retrieval metrics. While having respective advantages and limitations, the two approaches have not been compared under consistent experimental conditions. In this work, we present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks, including evidence-seeking, ad hoc, and tool retrieval. Our key finding is that simple, training-free query augmentation often performs on par with, or even surpasses, more expensive RL-based counterparts, especially when using powerful LLMs. Motivated by this discovery, we introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which, instead of rewriting a query, the LLM policy learns to generate a pseudo-document that maximizes retrieval performance, thus merging the flexibility and generative structure of prompting with the targeted optimization of RL. We show OPQE outperforms both standalone prompting and RL-based rewriting, demonstrating that a synergistic approach yields the best results. Our implementation is made available to facilitate reproducibility.

Problem

Research questions and friction points this paper is trying to address.

Systematically comparing prompting-based and RL-based query augmentation methods

Evaluating simple training-free approaches against expensive RL fine-tuning

Developing hybrid method combining prompting flexibility with RL optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid method combines prompting flexibility with RL optimization

Generates pseudo-documents to maximize retrieval performance

On-policy learning merges generative structure with targeted optimization

🔎 Similar Papers

An Optimizable Suffix Is Worth A Thousand Templates: Efficient Black-box Jailbreaking without Affirmative Phrases via LLM as Optimizer