Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World Documents

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

173K/year
🤖 AI Summary
Retrieval-augmented generation (RAG) systems face significant challenges in optimizing queries over diverse, unstructured real-world documents—spanning textual and multimodal content—particularly when labeled data is unavailable and heterogeneous retrievers (e.g., lexical, semantic, hybrid, or multimodal) must be supported. Method: We propose RL-QR, a reinforcement learning–based, annotation-free, retriever-specific query rewriting framework. Its core innovation is the Generalized Reward Policy Optimization (GRPO) algorithm, coupled with scene–question pair synthesis, enabling end-to-end, cross-modal, and scalable query rewriting training. Contribution/Results: Evaluated on industrial-scale data, RL-QR improves NDCG@3 by 11% for multimodal retrievers and by 9% for lexical retrievers. It is the first method to enable retriever-customized query optimization without human annotations, substantially enhancing RAG robustness and adaptability in complex, real-world scenarios.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems rely heavily on effective query formulation to unlock external knowledge, yet optimizing queries for diverse, unstructured real-world documents remains a challenge. We introduce extbf{RL-QR}, a reinforcement learning framework for retriever-specific query rewriting that eliminates the need for human-annotated datasets and extends applicability to both text-only and multi-modal databases. By synthesizing scenario-question pairs and leveraging Generalized Reward Policy Optimization (GRPO), RL-QR trains query rewriters tailored to specific retrievers, enhancing retrieval performance across varied domains. Experiments on industrial in-house data demonstrate significant improvements, with $ ext{RL-QR}_{ ext{multi-modal}}$ achieving an 11% relative gain in NDCG@3 for multi-modal RAG and $ ext{RL-QR}_{ ext{lexical}}$ yielding a 9% gain for lexical retrievers. However, challenges persist with semantic and hybrid retrievers, where rewriters failed to improve performance, likely due to training misalignments. Our findings highlight RL-QR's potential to revolutionize query optimization for RAG systems, offering a scalable, annotation-free solution for real-world retrieval tasks, while identifying avenues for further refinement in semantic retrieval contexts.
Problem

Research questions and friction points this paper is trying to address.

Optimizing queries for diverse unstructured real-world documents
Eliminating need for human-annotated datasets in query rewriting
Enhancing retrieval performance across text and multi-modal databases
Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-QR uses reinforcement learning for query rewriting
GRPO optimizes rewards without human-annotated data
Tailors rewriters to text and multi-modal retrievers