RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional RAG systems for complex question answering rely on static retrieval and struggle to support multi-step, adaptive search. Method: This paper proposes ReSearch—a fully end-to-end trainable information-seeking agent architecture that tightly integrates retrieval-augmented generation (RAG) with agent-based search. Its core innovation is a fine-grained, stepwise supervision framework combining process reward modeling, discriminative LLM-based reward learning, and a transferable reward model to jointly optimize query generation and answer reasoning. Contribution/Results: ReSearch significantly enhances the multi-step retrieval–reasoning closed-loop capability. Evaluated on four challenging benchmarks, it achieves an average performance gain of 25.6% over state-of-the-art baselines. Empirical results further demonstrate the generalizability and scalability of its reward model across diverse LLMs.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) has shown great potential for knowledge-intensive tasks, but its traditional architectures rely on static retrieval, limiting their effectiveness for complex questions that require sequential information-seeking. While agentic reasoning and search offer a more adaptive approach, most existing methods depend heavily on prompt engineering. In this work, we introduce RAG-Gym, a unified optimization framework that enhances information-seeking agents through fine-grained process supervision at each search step. We also propose ReSearch, a novel agent architecture that synergizes answer reasoning and search query generation within the RAG-Gym framework. Experiments on four challenging datasets show that RAG-Gym improves performance by up to 25.6% across various agent architectures, with ReSearch consistently outperforming existing baselines. Further analysis highlights the effectiveness of advanced LLMs as process reward judges and the transferability of trained reward models as verifiers for different LLMs. Additionally, we examine the scaling properties of training and inference in agentic RAG. The project homepage is available at https://rag-gym.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Enhancing static RAG architectures for complex sequential information-seeking tasks
Reducing reliance on prompt engineering in agentic reasoning and search methods
Synergizing answer reasoning and search query generation in retrieval-augmented systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained process supervision for stepwise agent optimization
ReSearch architecture synergizing reasoning and search generation
LLM-based process reward judges enabling transferable verification
🔎 Similar Papers
No similar papers found.
Guangzhi Xiong
Guangzhi Xiong
University of Virginia
Q
Qiao Jin
National Institutes of Health
X
Xiao Wang
University of Illinois Urbana-Champaign
Yin Fang
Yin Fang
National Institutes of Health
AI4BioinformaticsKnowledge GraphLanguage Model
H
Haolin Liu
University of Virginia
Y
Yifan Yang
National Institutes of Health
Fangyuan Chen
Fangyuan Chen
Dana-Farber Cancer Institute
OncologyBioinformaticsComputational biology
Z
Zhixing Song
University of Alabama at Birmingham
D
Dengyu Wang
Yale School of Medicine
Minjia Zhang
Minjia Zhang
University of Illinois at Urbana-Champagin
ParallelismMachine Learning SystemsModel CompressionLLM Application
Zhiyong Lu
Zhiyong Lu
Senior Investigator, NLM; Adjunct Professor of CS, UIUC
BioNLPBiomedical InformaticsMedical AIArtificial Intelligence
A
Aidong Zhang
University of Virginia