How Language Models Process Negation

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
Large language models frequently err in processing negation, yet the underlying mechanisms remain poorly understood. This study systematically investigates Mistral-7B and Llama-3.1-8B using a combination of observational and causal interpretability methods, revealing for the first time that these models employ two distinct strategies: suppressing the negated concept (suppression mechanism) and directly constructing a holistic representation of the negated phrase (construction mechanism), with the latter being dominant. Through attention head analysis and targeted module ablation, the authors identify shortcut biases introduced by late-stage attention modules. Critically, ablating these specific components significantly improves model accuracy on negation tasks, thereby advancing our understanding of how large language models internally handle negation.
📝 Abstract
We study how Large Language Models (LLMs) process negation mechanistically. First, we establish that even though open-weight models often provide wrong answers to questions involving negation, they do possess internal components that process negation correctly. Their poor accuracy is due to late-layer attention behavior that promotes simple shortcuts; ablating those attention modules greatly improves accuracy on negation-related questions. Second, we uncover how models process negation. We consider two hypotheses: models could use attention heads that attend to the phrase being negated and suppress related concepts, or they could directly construct a representation of the entire negative phrase (e.g., representing "not gas" as a vector that promotes liquids and solids). We apply a range of observational and causal interpretability techniques on Mistral-7B and Llama-3.1-8B to show that models implement both mechanisms, with the "constructive" mechanism being more prominent. Combined, our work deepens the understanding of LLMs' internals, highlighting construction-dominant computations and the coexistence of competing mechanisms within LLMs.
Problem

Research questions and friction points this paper is trying to address.

negation
Large Language Models
mechanistic interpretability
attention mechanisms
language understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

negation processing
attention ablation
constructive representation
mechanistic interpretability
large language models
🔎 Similar Papers
No similar papers found.