🤖 AI Summary
This work investigates whether language models acquire rare syntactic constructions—specifically, the extremely low-frequency English Article+Adjective+Numeral+Noun (AANN) pattern (e.g., “a beautiful five days”)—through grammatical generalization rather than rote memorization.
Method: Using controlled artificial corpora, we systematically remove, perturb, and augment AANN distribution; employ counterfactual training; and conduct construction-sensitive evaluation to isolate generalization effects.
Contribution/Results: We provide the first existence proof that large language models can generalize to rare syntax without exposure: models trained on corpora entirely devoid of AANN instances nonetheless generate it accurately. Performance improves significantly with increased input variability in semantically and structurally similar constructions (e.g., “a few days”). Results robustly demonstrate abstract syntactic generalization in Transformer-based models, challenging the dominant hypothesis that large models rely solely on high-frequency pattern memorization.
📝 Abstract
Language models learn rare syntactic phenomena, but the extent to which this is attributable to generalization vs. memorization is a major open question. To that end, we iteratively trained transformer language models on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (“a beautiful five days”). We compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which AANN sentences were removed. We found that AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g., “a few days”). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that LMs can learn rare grammatical phenomena by generalization from less rare phenomena. Data and code: https://github.com/kanishkamisra/aannalysis.