🤖 AI Summary
Large language models (LLMs) employ opaque, “black-box” mechanisms when performing retrieval-augmented question answering via in-context learning, hindering transparency and controllability in knowledge integration.
Method: We propose an attribution-driven attention head functional analysis framework. We formally define and identify two critical attention head types: *in-context heads*, responsible for retrieving relevant facts from the prompt, and *parametric heads*, encoding the model’s intrinsic knowledge. We further introduce functional vector intervention—a technique that edits attention weights to enable interpretable, targeted manipulation of knowledge retrieval pathways.
Contribution/Results: By integrating attention attribution analysis, functional vector extraction, and knowledge provenance tracing, our method precisely localizes knowledge sources within retrieval-augmented inference. Experiments demonstrate significantly enhanced interpretability and controllability of in-context learning, establishing a novel theoretical foundation and practical toolkit for trustworthy, secure retrieval-augmented LMs.
📝 Abstract
Large language models are able to exploit in-context learning to access external knowledge beyond their training data through retrieval-augmentation. While promising, its inner workings remain unclear. In this work, we shed light on the mechanism of in-context retrieval augmentation for question answering by viewing a prompt as a composition of informational components. We propose an attribution-based method to identify specialized attention heads, revealing in-context heads that comprehend instructions and retrieve relevant contextual information, and parametric heads that store entities' relational knowledge. To better understand their roles, we extract function vectors and modify their attention weights to show how they can influence the answer generation process. Finally, we leverage the gained insights to trace the sources of knowledge used during inference, paving the way towards more safe and transparent language models.