🤖 AI Summary
This work addresses the challenge of dynamically balancing contextual information against prior knowledge during language model inference. Methodologically, we introduce a controllable context-sensitivity task and, through layer-wise importance analysis and linear probing, identify—for the first time—a universal one-dimensional subspace—the “context-sensitivity knob”—that consistently governs context dependence across diverse models (Llama-3.1, Mistral, Gemma-2) and training regimes (base, instruction-tuned, fully fine-tuned). Crucially, precise linear intervention within this subspace at a single layer suffices to modulate model reliance on input context. Key contributions include: (i) strong cross-model and cross-regime generalizability and interpretability, even in unmodified base models; (ii) a significant correlation between intervention efficacy and answer representation separability within the subspace; and (iii) fine-tuned models achieving 85–95% task accuracy under controlled sensitivity modulation.
📝 Abstract
When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model's performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.