Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the underexplored mechanism by which large language models (LLMs) integrate internal parametric knowledge with external retrieved knowledge in retrieval-augmented generation (RAG). To this end, we propose the “Four-Stage Knowledge Flow Model” (refinement → activation → expression → competition), establishing the first hierarchical analytical framework for knowledge integration. We introduce Knowledge Activation Probability Entropy (KAPE) — a novel metric quantifying neuron-level preference toward knowledge sources — and uncover functional complementarity between multi-head attention and MLP layers in knowledge formation. Through macro-level knowledge flow analysis, module-specific neuron intervention, inter-layer dynamic tracking, and selective ablation experiments, we achieve precise attribution and controllable modulation of LLMs’ knowledge source dependence. Our approach significantly enhances RAG’s interpretability and reliability, providing a transparent and robust foundation for knowledge-intensive generation tasks.

Technology Category

Application Category

📝 Abstract
Considering the inherent limitations of parametric knowledge in large language models (LLMs), retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope. Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility. Despite this progress, the underlying knowledge utilization mechanisms of LLM-based RAG remain underexplored. In this paper, we present a systematic investigation of the intrinsic mechanisms by which LLMs integrate internal (parametric) and external (retrieved) knowledge in RAG scenarios. Specially, we employ knowledge stream analysis at the macroscopic level, and investigate the function of individual modules at the microscopic level. Drawing on knowledge streaming analyses, we decompose the knowledge utilization process into four distinct stages within LLM layers: knowledge refinement, knowledge elicitation, knowledge expression, and knowledge contestation. We further demonstrate that the relevance of passages guides the streaming of knowledge through these stages. At the module level, we introduce a new method, knowledge activation probability entropy (KAPE) for neuron identification associated with either internal or external knowledge. By selectively deactivating these neurons, we achieve targeted shifts in the LLM's reliance on one knowledge source over the other. Moreover, we discern complementary roles for multi-head attention and multi-layer perceptron layers during knowledge formation. These insights offer a foundation for improving interpretability and reliability in retrieval-augmented LLMs, paving the way for more robust and transparent generative solutions in knowledge-intensive domains.
Problem

Research questions and friction points this paper is trying to address.

Exploring knowledge use mechanisms in LLM-based RAG systems
Analyzing how LLMs integrate internal and external knowledge
Improving interpretability and reliability of retrieval-augmented LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes knowledge streaming in four LLM stages
Introduces KAPE for neuron identification control
Explores multi-head attention roles in knowledge
🔎 Similar Papers
No similar papers found.