Implicit Representations of Grammaticality in Language Models

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Although language models generate syntactically well-formed text, their string probabilities prove unreliable for distinguishing grammatical from ungrammatical sentences. This work presents the first systematic validation that internal representations within language model hidden layers encode grammaticality independently of likelihood. The authors probe these representations using linear classifiers trained on synthetic syntactic perturbation datasets, human-annotated grammaticality judgments, and multilingual evaluations. Results demonstrate that such probes substantially outperform model probability scores on cross-lingual grammaticality classification tasks and exhibit weak correlation with string probabilities. However, the probes show limited performance on semantic plausibility tasks, highlighting their sensitivity specifically to syntactic—rather than semantic—information.

📝 Abstract

Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. When applied to semantic plausibility benchmarks, in which both members of a minimal pair are grammatical and differ in only plausibility, the probe however performs worse than string probability. The English-trained probe also exhibits nontrivial cross-lingual generalization, outperforming string probabilities on grammaticality benchmarks in numerous other languages. Additionally, probe scores correlate only weakly with string probabilities. These results collectively suggest that LMs acquire to some extent an implicit grammaticality distinction within their hidden layers.

Problem

Research questions and friction points this paper is trying to address.

grammaticality

language models

implicit representations

string probability

probing

Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit grammaticality

linear probing

language models