🤖 AI Summary
This study investigates whether language model surprisal can effectively capture metaphorical novelty and how this relationship varies across data types. By systematically evaluating cloze-style surprisal values from 16 language models of varying scales and architectures against human-rated novelty scores on both corpus-derived and synthetically generated metaphor datasets, the work reveals a moderate correlation between surprisal and perceived novelty for the first time. Notably, in naturally occurring corpora, this correlation weakens as model scale increases, whereas in synthetic data it strengthens—demonstrating opposing scaling trends. These findings suggest that surprisal, while informative, exhibits context-dependent limitations as a proxy for linguistic creativity and should be interpreted with caution depending on the nature of the underlying data.
📝 Abstract
Novel metaphor comprehension involves complex semantic processes and linguistic creativity, making it an interesting task for studying language models (LMs). This study investigates whether surprisal, a probabilistic measure of predictability in LMs, correlates with annotations of metaphor novelty in different datasets. We analyse the surprisal of metaphoric words in corpus-based and synthetic metaphor datasets using 16 causal LM variants. We propose a cloze-style surprisal method that conditions on full-sentence context. Results show that LM surprisal yields significant moderate correlations with scores/labels of metaphor novelty. We further identify divergent scaling patterns: on corpus-based data, correlation strength decreases with model size (inverse scaling effect), whereas on synthetic data it increases (quality-power hypothesis). We conclude that while surprisal can partially account for annotations of metaphor novelty, it remains limited as a metric of linguistic creativity. Code and data are publicly available: https://github.com/OmarMomen14/surprisal-metaphor-novelty