🤖 AI Summary
This study addresses the ambiguity in existing research regarding the distinction between semantically anomalous sentences and truly meaningless ones, as well as the lack of systematic evaluation of large language models’ (LLMs) capabilities in this domain. It presents the first systematic comparison between human judgments and LLMs’ assessments of the interpretability of five types of semantic anomalies, both with and without contextual support. The work further leverages LLMs to generate coherent contexts that enhance the semantic plausibility of such sentences. Through multi-dataset analysis and human subjective evaluations, the study reveals that the majority of sentences labeled as “meaningless” are, in fact, interpretable anomalies. Moreover, LLMs not only effectively discern these distinctions but also produce context that significantly improves semantic coherence, thereby exposing the prevalent overuse of “meaningless” labels in current datasets.
📝 Abstract
Nonsensical and anomalous sentences have been instrumental in the development of computational models of semantic interpretation. A core challenge is to distinguish between what is merely anomalous (but can be interpreted given a supporting context) and what is truly nonsensical. However, it is unclear (a) how nonsensical, rather than merely anomalous, existing datasets are; and (b) how well LLMs can make this distinction. In this paper, we answer both questions by collecting sensicality judgments from human raters and LLMs on sentences from five semantically deviant datasets: both context-free and when providing a context. We find that raters consider most sentences at most anomalous, and only a few as properly nonsensical. We also show that LLMs are substantially skilled in generating plausible contexts for anomalous cases.