🤖 AI Summary
This work investigates the generalization capacity of human-scale Transformer language models to the rare English “LET-ALONE” construction, focusing on asymmetries between syntactic form and semantic interpretation. Addressing the open question of whether such models exhibit human-like meaning generalization—despite acquiring unattested grammatical patterns—we design a synthetic benchmark with controlled data filtering and a dual-axis evaluation framework assessing both syntactic acceptability and semantic entailment. Results show that models robustly recognize the construction’s rare surface forms—even when distractor structures are excluded—but systematically fail semantic reasoning tasks, failing to capture its distinctive semantic constraints. This constitutes the first empirical demonstration of a substantial gap between form-based and meaning-based generalization in current large language models, revealing their markedly lower semantic sample efficiency relative to humans. The findings provide critical evidence for research on linguistic acquisition and the cognitive limits of neural language models.
📝 Abstract
Humans have a remarkable ability to acquire and understand grammatical phenomena that are seen rarely, if ever, during childhood. Recent evidence suggests that language models with human-scale pretraining data may possess a similar ability by generalizing from frequent to rare constructions. However, it remains an open question how widespread this generalization ability is, and to what extent this knowledge extends to meanings of rare constructions, as opposed to just their forms. We fill this gap by testing human-scale transformer language models on their knowledge of both the form and meaning of the (rare and quirky) English LET-ALONE construction. To evaluate our LMs we construct a bespoke synthetic benchmark that targets syntactic and semantic properties of the construction. We find that human-scale LMs are sensitive to form, even when related constructions are filtered from the dataset. However, human-scale LMs do not make correct generalizations about LET-ALONE's meaning. These results point to an asymmetry in the current architectures' sample efficiency between language form and meaning, something which is not present in human language learners.