KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

136K/year

🤖 AI Summary

To address fact inconsistency in retrieval-augmented generation (RAG) caused by noisy retrieved documents, this paper proposes a robust knowledge utilization framework. First, it constructs structured knowledge representations to explicitly model factual logic, enabling fine-grained error detection. Second, it introduces a dense direct preference optimization (DDPO) objective that jointly optimizes generation faithfulness and retrieval relevance. Third, it devises a semantically consistent contrastive correction data generation mechanism, synthesizing high-quality training samples with minimal human annotation. Experiments demonstrate substantial improvements in factual accuracy and cross-domain generalization across multiple large language models and evaluation scales. The framework establishes a novel paradigm for enhancing RAG robustness under low-resource conditions, offering both theoretical insight and practical utility for reliable knowledge-grounded generation.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware Refinement and Enhancement for RAG), which improves knowledge utilization through three key innovations: (1) structured knowledge representations that facilitate error detection during training, (2) Dense Direct Preference Optimization (DDPO)-a refined training objective that prioritizes correction of critical errors, and (3) a contrastive data generation pipeline that maintains semantic consistency while rectifying factual inaccuracies. Experiments show our method significantly enhances standard RAG pipelines across model scales, improving both in-domain and out-of-domain task performance without compromising general capabilities. Notably, these gains are achieved with modest training data, suggesting data-efficient optimization is possible through targeted learning strategies. Our findings establish a new direction for RAG improvement: by improving how models learn to process retrieved content, we can enhance performance across diverse inference paradigms. All data and code will be publicly available on Github.

Problem

Research questions and friction points this paper is trying to address.

Addresses factual inconsistencies in RAG due to noisy retrieved documents

Enhances generative models' ability to process noisy content effectively

Improves knowledge utilization through structured representations and optimized training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured knowledge representations for error detection

Dense Direct Preference Optimization for critical errors

Contrastive data generation for semantic consistency

🔎 Similar Papers

No similar papers found.