🤖 AI Summary
Retrieval-augmented generation (RAG) systems face significant challenges in optimizing queries over diverse, unstructured real-world documents—spanning textual and multimodal content—particularly when labeled data is unavailable and heterogeneous retrievers (e.g., lexical, semantic, hybrid, or multimodal) must be supported.
Method: We propose RL-QR, a reinforcement learning–based, annotation-free, retriever-specific query rewriting framework. Its core innovation is the Generalized Reward Policy Optimization (GRPO) algorithm, coupled with scene–question pair synthesis, enabling end-to-end, cross-modal, and scalable query rewriting training.
Contribution/Results: Evaluated on industrial-scale data, RL-QR improves NDCG@3 by 11% for multimodal retrievers and by 9% for lexical retrievers. It is the first method to enable retriever-customized query optimization without human annotations, substantially enhancing RAG robustness and adaptability in complex, real-world scenarios.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems rely heavily on effective query formulation to unlock external knowledge, yet optimizing queries for diverse, unstructured real-world documents remains a challenge. We introduce extbf{RL-QR}, a reinforcement learning framework for retriever-specific query rewriting that eliminates the need for human-annotated datasets and extends applicability to both text-only and multi-modal databases. By synthesizing scenario-question pairs and leveraging Generalized Reward Policy Optimization (GRPO), RL-QR trains query rewriters tailored to specific retrievers, enhancing retrieval performance across varied domains. Experiments on industrial in-house data demonstrate significant improvements, with $ ext{RL-QR}_{ ext{multi-modal}}$ achieving an 11% relative gain in NDCG@3 for multi-modal RAG and $ ext{RL-QR}_{ ext{lexical}}$ yielding a 9% gain for lexical retrievers. However, challenges persist with semantic and hybrid retrievers, where rewriters failed to improve performance, likely due to training misalignments. Our findings highlight RL-QR's potential to revolutionize query optimization for RAG systems, offering a scalable, annotation-free solution for real-world retrieval tasks, while identifying avenues for further refinement in semantic retrieval contexts.