🤖 AI Summary
This study investigates how statistical regularities in linguistic input facilitate syntactic acquisition, with a focus on the learning mechanisms underlying subject–verb agreement. It proposes the “collocational guidance” hypothesis, which posits that word co-occurrence patterns serve as reliable cues to syntactic dependencies. By simulating the acquisition process using neural networks trained on controlled synthetic data and complementing this with analyses of child-directed speech and generalization assessments, the work establishes the first systematic link between collocational co-occurrence statistics and subject–verb agreement learning. The findings reveal that moderate variability in subject–verb collocations optimally supports robust learning, and crucially, the variability observed in real child-directed input falls precisely within this beneficial range. These results provide empirical support for the collocational guidance hypothesis and suggest a viable learning strategy well-aligned with the statistical structure of children’s linguistic environment.
📝 Abstract
In what ways might statistical signals in linguistic input assist with the acquisition of syntax? Here we hypothesize a mechanism called collocational bootstrapping, in which regularities in word co-occurrence patterns can provide cues to syntactic dependencies. We investigate whether this mechanism can support the acquisition of English subject-verb agreement. First, we simulate language acquisition by training neural networks on synthetic datasets that vary in how predictable their subject-verb pairings are. We find that there is a range of variability levels at which these statistical learners robustly learn subject-verb agreement. We then analyze the variability of subject-verb pairings in child-directed language, and we find that the variability in such data falls within the range that supported robust generalization in our computational simulations. Taken together, these results suggest that collocational bootstrapping is a viable learning strategy for the type of input that children receive.