Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Retrieval-Augmented Generation (RAG) suffers from insufficient utilization of retrieved evidence and opaque, uninterpretable reasoning in knowledge-intensive tasks. Method: We propose ARENA, a novel framework featuring an adaptive reward mechanism that guides large language models (LLMs) to autonomously identify critical evidence, perform structured reasoning, and generate answers with explicit decision trajectories—without fine-tuning. ARENA integrates reinforcement learning–driven structured generation, dynamic evidence importance scoring, and interpretable chain-of-decision modeling, and is compatible with open-weight models including Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct. Contribution/Results: On multi-hop question answering, ARENA outperforms mainstream RAG baselines by 10–30% and matches the performance of commercial state-of-the-art models (e.g., OpenAI-o1, DeepSeek-R1), significantly enhancing both reasoning capability and decision transparency in RAG systems.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has significantly improved the performance of large language models (LLMs) on knowledge-intensive domains. However, although RAG achieved successes across distinct domains, there are still some unsolved challenges: 1) Effectiveness. Existing research mainly focuses on developing more powerful RAG retrievers, but how to enhance the generator's (LLM's) ability to utilize the retrieved information for reasoning and generation? 2) Transparency. Most RAG methods ignore which retrieved content actually contributes to the reasoning process, resulting in a lack of interpretability and visibility. To address this, we propose ARENA (Adaptive-Rewarded Evidence Navigation Agent), a transparent RAG generator framework trained via reinforcement learning (RL) with our proposed rewards. Based on the structured generation and adaptive reward calculation, our RL-based training enables the model to identify key evidence, perform structured reasoning, and generate answers with interpretable decision traces. Applied to Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct, abundant experiments with various RAG baselines demonstrate that our model achieves 10-30% improvements on all multi-hop QA datasets, which is comparable with the SOTA Commercially-developed LLMs (e.g., OpenAI-o1, DeepSeek-R1). Further analyses show that ARENA has strong flexibility to be adopted on new datasets without extra training. Our models and codes are publicly released.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM's ability to utilize retrieved information for reasoning

Improving transparency in RAG by identifying key evidence contributions

Enabling interpretable decision traces in RAG-generated answers

Innovation

Methods, ideas, or system contributions that make the work stand out.

ARENA framework enhances RAG via reinforcement learning

Adaptive rewards improve evidence identification and reasoning

Structured generation ensures interpretable decision traces

🔎 Similar Papers

No similar papers found.