🤖 AI Summary
This study investigates whether large language models (LLMs) can reliably distinguish grammatical from ungrammatical sentences using token-level surprisal—a common proxy for syntactic knowledge—thereby challenging the default assumption that generation probabilities directly reflect grammatical competence. We introduce a minimal-pair surprisal-difference analysis and construct a novel benchmark targeting syntactic, semantic, and pragmatic anomalies under controlled conditions. Four state-of-the-art LLMs are systematically evaluated on their surprisal responses to these stimuli. Results show no consistent increase in surprisal for ungrammatical sentences; instead, semantic and pragmatic violations elicit significantly higher surprisal than syntactic ones. This indicates that surface-level probability outputs do not robustly encode or expose LLMs’ implicit syntactic knowledge boundaries. Our work provides both a methodological advance—via minimal-pair surprisal contrast—and empirical evidence that challenges probabilistic proxies for grammaticality, offering a refined framework for evaluating linguistic competence in LLMs.
📝 Abstract
A controversial test for Large Language Models concerns the ability to discern possible from impossible language. While some evidence attests to the models' sensitivity to what crosses the limits of grammatically impossible language, this evidence has been contested on the grounds of the soundness of the testing material. We use model-internal representations to tap directly into the way Large Language Models represent the 'grammatical-ungrammatical' distinction. In a novel benchmark, we elicit probabilities from 4 models and compute minimal-pair surprisal differences, juxtaposing probabilities assigned to grammatical sentences to probabilities assigned to (i) lower frequency grammatical sentences, (ii) ungrammatical sentences, (iii) semantically odd sentences, and (iv) pragmatically odd sentences. The prediction is that if string-probabilities can function as proxies for the limits of grammar, the ungrammatical condition will stand out among the conditions that involve linguistic violations, showing a spike in the surprisal rates. Our results do not reveal a unique surprisal signature for ungrammatical prompts, as the semantically and pragmatically odd conditions consistently show higher surprisal. We thus demonstrate that probabilities do not constitute reliable proxies for model-internal representations of syntactic knowledge. Consequently, claims about models being able to distinguish possible from impossible language need verification through a different methodology.