RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Traditional RAG systems for complex question answering rely on static retrieval and struggle to support multi-step, adaptive search. Method: This paper proposes ReSearch—a fully end-to-end trainable information-seeking agent architecture that tightly integrates retrieval-augmented generation (RAG) with agent-based search. Its core innovation is a fine-grained, stepwise supervision framework combining process reward modeling, discriminative LLM-based reward learning, and a transferable reward model to jointly optimize query generation and answer reasoning. Contribution/Results: ReSearch significantly enhances the multi-step retrieval–reasoning closed-loop capability. Evaluated on four challenging benchmarks, it achieves an average performance gain of 25.6% over state-of-the-art baselines. Empirical results further demonstrate the generalizability and scalability of its reward model across diverse LLMs.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) has shown great potential for knowledge-intensive tasks, but its traditional architectures rely on static retrieval, limiting their effectiveness for complex questions that require sequential information-seeking. While agentic reasoning and search offer a more adaptive approach, most existing methods depend heavily on prompt engineering. In this work, we introduce RAG-Gym, a unified optimization framework that enhances information-seeking agents through fine-grained process supervision at each search step. We also propose ReSearch, a novel agent architecture that synergizes answer reasoning and search query generation within the RAG-Gym framework. Experiments on four challenging datasets show that RAG-Gym improves performance by up to 25.6% across various agent architectures, with ReSearch consistently outperforming existing baselines. Further analysis highlights the effectiveness of advanced LLMs as process reward judges and the transferability of trained reward models as verifiers for different LLMs. Additionally, we examine the scaling properties of training and inference in agentic RAG. The project homepage is available at https://rag-gym.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Enhancing static RAG architectures for complex sequential information-seeking tasks

Reducing reliance on prompt engineering in agentic reasoning and search methods

Synergizing answer reasoning and search query generation in retrieval-augmented systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained process supervision for stepwise agent optimization

ReSearch architecture synergizing reasoning and search generation

LLM-based process reward judges enabling transferable verification

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting