🤖 AI Summary
To address two fundamental bottlenecks in retrieval-augmented generation (RAG)—inaccurate retrieval and inefficient context utilization—this paper introduces RAG-RL, the first reasoning language model explicitly designed for RAG. Our method employs a two-stage training paradigm: supervised pre-warming followed by PPO-based reinforcement fine-tuning, jointly optimizing retrieval and generation capabilities. Crucially, we propose a RAG-specific curriculum learning strategy that progressively increases task difficulty. Key findings reveal that strong generative capacity can alleviate retrieval pressure in a feedback manner, and curriculum-driven RL substantially enhances RAG’s robustness and generalization. Evaluated on two open-domain question answering benchmarks, RAG-RL consistently outperforms state-of-the-art generative readers. Furthermore, we systematically characterize how curriculum design principles—such as difficulty scheduling and reward shaping—govern overall performance, providing actionable insights for future RAG optimization.
📝 Abstract
Recent research highlights the challenges retrieval models face in retrieving useful contexts and the limitations of generation models in effectively utilizing those contexts in retrieval-augmented generation (RAG) settings. To address these challenges, we introduce RAG-RL, the first reasoning language model (RLM) specifically trained for RAG. RAG-RL demonstrates that stronger answer generation models can identify relevant contexts within larger sets of retrieved information -- thereby alleviating the burden on retrievers -- while also being able to utilize those contexts more effectively. Moreover, we show that curriculum design in the reinforcement learning (RL) post-training process is a powerful approach to enhancing model performance. We benchmark our method on two open-domain question-answering datasets and achieve state-of-the-art results, surpassing previous SOTA generative reader models. In addition, we offers empirical insights into various curriculum learning strategies, providing a deeper understanding of their impact on model performance.