🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit syntactic bootstrapping—the cognitive mechanism by which children infer verb semantics from syntactic distribution—particularly for psychological verbs. Method: Using RoBERTa and GPT-2, the authors conduct ablation experiments on datasets with systematically perturbed syntactic structure or lexical co-occurrence statistics, quantifying the relative reliance of verb and noun representations on syntactic versus distributional cues. Contribution/Results: Results show that verb representations degrade significantly under syntactic perturbation, whereas noun representations depend more strongly on co-occurrence statistics. This provides the first empirical validation of the syntactic bootstrapping hypothesis in large-scale pretrained language models. Moreover, the study demonstrates that controlled training environments can rigorously test developmental linguistic theories, bridging computational modeling and cognitive linguistics through a novel experimental paradigm. The findings underscore syntax as a critical inductive bias for verb learning in LLMs, aligning with psycholinguistic accounts of early language acquisition.
📝 Abstract
Syntactic bootstrapping (Gleitman, 1990) is the hypothesis that children use the syntactic environments in which a verb occurs to learn its meaning. In this paper, we examine whether large language models exhibit a similar behavior. We do this by training RoBERTa and GPT-2 on perturbed datasets where syntactic information is ablated. Our results show that models' verb representation degrades more when syntactic cues are removed than when co-occurrence information is removed. Furthermore, the representation of mental verbs, for which syntactic bootstrapping has been shown to be particularly crucial in human verb learning, is more negatively impacted in such training regimes than physical verbs. In contrast, models' representation of nouns is affected more when co-occurrences are distorted than when syntax is distorted. In addition to reinforcing the important role of syntactic bootstrapping in verb learning, our results demonstrated the viability of testing developmental hypotheses on a larger scale through manipulating the learning environments of large language models.