Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs

📅 2024-03-28
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 27
Influential: 1
📄 PDF

career value

175K/year
🤖 AI Summary
This work investigates whether language models acquire rare syntactic constructions—specifically, the extremely low-frequency English Article+Adjective+Numeral+Noun (AANN) pattern (e.g., “a beautiful five days”)—through grammatical generalization rather than rote memorization. Method: Using controlled artificial corpora, we systematically remove, perturb, and augment AANN distribution; employ counterfactual training; and conduct construction-sensitive evaluation to isolate generalization effects. Contribution/Results: We provide the first existence proof that large language models can generalize to rare syntax without exposure: models trained on corpora entirely devoid of AANN instances nonetheless generate it accurately. Performance improves significantly with increased input variability in semantically and structurally similar constructions (e.g., “a few days”). Results robustly demonstrate abstract syntactic generalization in Transformer-based models, challenging the dominant hypothesis that large models rely solely on high-frequency pattern memorization.

Technology Category

Application Category

📝 Abstract
Language models learn rare syntactic phenomena, but the extent to which this is attributable to generalization vs. memorization is a major open question. To that end, we iteratively trained transformer language models on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (“a beautiful five days”). We compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which AANN sentences were removed. We found that AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g., “a few days”). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that LMs can learn rare grammatical phenomena by generalization from less rare phenomena. Data and code: https://github.com/kanishkamisra/aannalysis.
Problem

Research questions and friction points this paper is trying to address.

Study how language models learn rare syntactic phenomena
Determine if learning is from generalization or memorization
Investigate rare AANN construction learning in English
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iteratively trained transformers on manipulated corpora
Compared learning of rare AANN construction variants
Enhanced learning via input variability generalization
🔎 Similar Papers
No similar papers found.