🤖 AI Summary
This work addresses the poorly understood mechanisms of factual knowledge storage and representation in large language models (LLMs), identifying two critical limitations of the dominant “knowledge localization” (KL) hypothesis: its overly rigid assumption that knowledge resides exclusively in sparse neuron subsets, and its neglect of the attention mechanism’s role. To overcome these, we propose the more general “query localization” (QL) hypothesis, framing knowledge modification as a consistency optimization problem over query-key interactions. We accordingly design a consistency-aware knowledge neuron fine-tuning method. Through rigorous analysis—including knowledge neuron identification, attention visualization, causal intervention, and 39 ablation studies—we empirically demonstrate that QL substantially outperforms KL. On diverse knowledge editing benchmarks, our approach achieves an average accuracy improvement of 12.7%. Crucially, this is the first study to provide empirical evidence for a query-driven mechanism underlying knowledge representation in LLMs.
📝 Abstract
Large language models (LLMs) store extensive factual knowledge, but the mechanisms behind how they store and express this knowledge remain unclear. The Knowledge Neuron (KN) thesis is a prominent theory for explaining these mechanisms. This theory is based on the Knowledge Localization (KL) assumption, which suggests that a fact can be localized to a few knowledge storage units, namely knowledge neurons. However, this assumption has two limitations: first, it may be too rigid regarding knowledge storage, and second, it neglects the role of the attention module in knowledge expression. In this paper, we first re-examine the KL assumption and demonstrate that its limitations do indeed exist. To address these, we then present two new findings, each targeting one of the limitations: one focusing on knowledge storage and the other on knowledge expression. We summarize these findings as extbf{Query Localization} (QL) assumption and argue that the KL assumption can be viewed as a simplification of the QL assumption. Based on QL assumption, we further propose the Consistency-Aware KN modification method, which improves the performance of knowledge modification, further validating our new assumption. We conduct 39 sets of experiments, along with additional visualization experiments, to rigorously confirm our conclusions. Code is available at https://github.com/heng840/KnowledgeLocalization.