Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval noise severely degrades the generation quality of large language models (LLMs) in retrieval-augmented generation (RAG). Existing evidence extraction methods lack explicit reasoning, leading to erroneous omission of critical clues and poor generalization. To address this, we propose the first reinforcement learning–based rational evidence extraction framework. Our method decouples knowledge tokens to enable joint end-to-end training of reasoning and extraction; designs a verifiable, multi-dimensional reward function—incorporating answer correctness, evidence conciseness, and format compliance; and introduces a knowledge token masking mechanism to unify the reasoning–extraction response structure. Evaluated on three benchmark datasets, our approach significantly improves downstream task accuracy and generates more precise, concise, and highly relevant evidence. It demonstrates strong practical efficacy for online RAG system deployment.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) effectively improves the accuracy of Large Language Models (LLMs). However, retrieval noises significantly impact the quality of LLMs' generation, necessitating the development of denoising mechanisms. Previous methods extract evidence straightforwardly without explicit thinking, which risks filtering out key clues and struggles with generalization. To this end, we propose LEAR, which learns to extract rational evidence by (1) explicitly reasoning to identify potential cues within retrieval contents first, and then (2) consciously extracting to avoid omitting any key cues helpful for answering questions. Specifically, we frame evidence reasoning and evidence extraction into one unified response for end-to-end training; apply knowledge token masks for disentanglement to derive reasoning-based and extraction-based answers; and devise three types of verifiable reward functions, including answer, length, and format, to update the model via the policy optimization algorithm. Extensive experiments on three benchmark datasets show the effectiveness of LEAR, providing compact and high-quality evidence, improving the accuracy of downstream tasks, and promoting effective application in online RAG systems.
Problem

Research questions and friction points this paper is trying to address.

Reducing retrieval noise impact on LLM generation quality
Improving evidence extraction via explicit reasoning and conscious extraction
Enhancing generalization and accuracy in Retrieval-Augmented Generation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit reasoning to identify potential cues
Conscious extraction to avoid omitting key clues
Verifiable reward functions for policy optimization
X
Xinping Zhao
Harbin Institute of Technology (Shenzhen)
S
Shouzheng Huang
Harbin Institute of Technology (Shenzhen)
Y
Yan Zhong
Peking University
Xinshuo Hu
Xinshuo Hu
Harbin Institute of Technology, Shenzhen
Large Language ModelText GenerationTruthfulness
Baotian Hu
Baotian Hu
Harbin Institute of Technology (Shenzhen)
LLMMLLMNLP
M
Min Zhang
Harbin Institute of Technology (Shenzhen)