Tug-of-war between idiom's figurative and literal meanings in LLMs

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Idiomatic non-compositionality—where literal and figurative meanings diverge—poses representation and disambiguation challenges for language models. This work investigates how large language models process idioms, using LLaMA3.2-1B-base and mechanistic interpretability techniques including attention head attribution, activation tracing, and sublayer intervention. We identify, for the first time, a three-stage neural pathway underlying idiom comprehension: (1) parallel activation of both literal and figurative meanings early in processing; (2) selective enhancement of figurative meaning and suppression of literal meaning by specific attention heads; and (3) coexistence of these meanings via a shared “intermediate representational pathway” and an independent “literal bypass.” Our findings provide the first fine-grained neural evidence that autoregressive decoding dynamically coordinates dual semantic interpretations. This advances cognitive modeling of idiom processing and establishes a foundation for controllable semantic intervention in language models.

Technology Category

Application Category

📝 Abstract

Idioms present a unique challenge for language models due to their non-compositional figurative meanings, which often strongly diverge from the idiom's literal interpretation. This duality requires a model to learn representing and deciding between the two meanings to interpret an idiom in a figurative sense, or literally. In this paper, we employ tools from mechanistic interpretability to trace how a large pretrained causal transformer (LLama3.2-1B-base) deals with this ambiguity. We localize three steps of idiom processing: First, the idiom's figurative meaning is retrieved in early attention and MLP sublayers. We identify specific attention heads which boost the figurative meaning of the idiom while suppressing the idiom's literal interpretation. The model subsequently represents the figurative representation through an intermediate path. Meanwhile, a parallel bypass route forwards literal interpretation, ensuring that a both reading remain available. Overall, our findings provide a mechanistic evidence for idiom comprehension in an autoregressive transformer.

Problem

Research questions and friction points this paper is trying to address.

How LLMs handle figurative vs literal meanings of idioms

Mechanistic tracing of idiom processing steps in transformers

Identifying neural pathways for dual idiom interpretations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mechanistic interpretability traces idiom processing steps

Specific attention heads boost figurative meanings

Parallel routes maintain literal and figurative interpretations

🔎 Similar Papers

Improving LLM Abilities in Idiomatic Translation