Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval-Augmented Generation (RAG) suffers from insufficient utilization of retrieved evidence and opaque, uninterpretable reasoning in knowledge-intensive tasks. Method: We propose ARENA, a novel framework featuring an adaptive reward mechanism that guides large language models (LLMs) to autonomously identify critical evidence, perform structured reasoning, and generate answers with explicit decision trajectories—without fine-tuning. ARENA integrates reinforcement learning–driven structured generation, dynamic evidence importance scoring, and interpretable chain-of-decision modeling, and is compatible with open-weight models including Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct. Contribution/Results: On multi-hop question answering, ARENA outperforms mainstream RAG baselines by 10–30% and matches the performance of commercial state-of-the-art models (e.g., OpenAI-o1, DeepSeek-R1), significantly enhancing both reasoning capability and decision transparency in RAG systems.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) has significantly improved the performance of large language models (LLMs) on knowledge-intensive domains. However, although RAG achieved successes across distinct domains, there are still some unsolved challenges: 1) Effectiveness. Existing research mainly focuses on developing more powerful RAG retrievers, but how to enhance the generator's (LLM's) ability to utilize the retrieved information for reasoning and generation? 2) Transparency. Most RAG methods ignore which retrieved content actually contributes to the reasoning process, resulting in a lack of interpretability and visibility. To address this, we propose ARENA (Adaptive-Rewarded Evidence Navigation Agent), a transparent RAG generator framework trained via reinforcement learning (RL) with our proposed rewards. Based on the structured generation and adaptive reward calculation, our RL-based training enables the model to identify key evidence, perform structured reasoning, and generate answers with interpretable decision traces. Applied to Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct, abundant experiments with various RAG baselines demonstrate that our model achieves 10-30% improvements on all multi-hop QA datasets, which is comparable with the SOTA Commercially-developed LLMs (e.g., OpenAI-o1, DeepSeek-R1). Further analyses show that ARENA has strong flexibility to be adopted on new datasets without extra training. Our models and codes are publicly released.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM's ability to utilize retrieved information for reasoning
Improving transparency in RAG by identifying key evidence contributions
Enabling interpretable decision traces in RAG-generated answers
Innovation

Methods, ideas, or system contributions that make the work stand out.

ARENA framework enhances RAG via reinforcement learning
Adaptive rewards improve evidence identification and reasoning
Structured generation ensures interpretable decision traces
🔎 Similar Papers
No similar papers found.
Jingyi Ren
Jingyi Ren
Tsinghua University
Yekun Xu
Yekun Xu
Unknown affiliation
X
Xiaolong Wang
Department of Computer Science and Technology, Tsinghua University, Beijing, China; Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
Weitao Li
Weitao Li
清华大学计算机系
RAG RLAgentsMedicine
Weizhi Ma
Weizhi Ma
Tsinghua University
LLM and AgentsRecommendationAI for Healthcare
Y
Yang Liu
Department of Computer Science and Technology, Tsinghua University, Beijing, China; Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China