🤖 AI Summary
RAG systems often suffer from reduced generation reliability due to conflicts between the model’s internal knowledge and external retrieved evidence—arising from knowledge inconsistencies or retrieval noise. To address this, we propose CARE-RAG, the first RAG framework incorporating a conflict-aware evidence summarization mechanism. It jointly optimizes evidence through parameter-aware modeling—leveraging LLaMA3.2-based knowledge distillation and parameter difference tracking—and context-aware refinement—employing dynamic context filtering and conflict-driven summarization—complemented by QA-based evidence reconciliation for multi-source可信 fusion. Our key innovation lies in explicitly modeling conflict identification as the primary driver for summary generation, thereby enhancing the robustness of evidence integration. Experiments on a revised question-answering benchmark demonstrate that CARE-RAG significantly outperforms state-of-the-art RAG baselines, particularly excelling in high-noise and strong-conflict scenarios, with marked improvements in both accuracy and stability.
📝 Abstract
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating their parametric knowledge with external retrieved content. However, knowledge conflicts caused by internal inconsistencies or noisy retrieved content can severely undermine the generation reliability of RAG systems.In this work, we argue that LLMs should rethink all evidence, including both retrieved content and internal knowledge, before generating responses.We propose CARE-RAG (Conflict-Aware and Reliable Evidence for RAG), a novel framework that improves trustworthiness through Conflict-Driven Summarization of all available evidence.CARE-RAG first derives parameter-aware evidence by comparing parameter records to identify diverse internal perspectives. It then refines retrieved evidences to produce context-aware evidence, removing irrelevant or misleading content. To detect and summarize conflicts, we distill a 3B LLaMA3.2 model to perform conflict-driven summarization, enabling reliable synthesis across multiple sources.To further ensure evaluation integrity, we introduce a QA Repair step to correct outdated or ambiguous benchmark answers.Experiments on revised QA datasets with retrieval data show that CARE-RAG consistently outperforms strong RAG baselines, especially in scenarios with noisy or conflicting evidence.