Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs

📅 2024-03-28

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 27

✨ Influential: 1

career value

139K/year

🤖 AI Summary

This work investigates whether language models acquire rare syntactic constructions—specifically, the extremely low-frequency English Article+Adjective+Numeral+Noun (AANN) pattern (e.g., “a beautiful five days”)—through grammatical generalization rather than rote memorization. Method: Using controlled artificial corpora, we systematically remove, perturb, and augment AANN distribution; employ counterfactual training; and conduct construction-sensitive evaluation to isolate generalization effects. Contribution/Results: We provide the first existence proof that large language models can generalize to rare syntax without exposure: models trained on corpora entirely devoid of AANN instances nonetheless generate it accurately. Performance improves significantly with increased input variability in semantically and structurally similar constructions (e.g., “a few days”). Results robustly demonstrate abstract syntactic generalization in Transformer-based models, challenging the dominant hypothesis that large models rely solely on high-frequency pattern memorization.

Technology Category

Application Category

📝 Abstract

Language models learn rare syntactic phenomena, but the extent to which this is attributable to generalization vs. memorization is a major open question. To that end, we iteratively trained transformer language models on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (“a beautiful five days”). We compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which AANN sentences were removed. We found that AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g., “a few days”). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that LMs can learn rare grammatical phenomena by generalization from less rare phenomena. Data and code: https://github.com/kanishkamisra/aannalysis.

Problem

Research questions and friction points this paper is trying to address.

Study how language models learn rare syntactic phenomena

Determine if learning is from generalization or memorization

Investigate rare AANN construction learning in English

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iteratively trained transformers on manipulated corpora

Compared learning of rare AANN construction variants

Enhanced learning via input variability generalization

🔎 Similar Papers

No similar papers found.