Resource-sensitive but language-blind: Community size and not grammatical complexity better predicts the accuracy of Large Language Models in a novel Wug Test

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) achieve human-level performance in cross-lingual morphological generalization—and whether performance is governed primarily by intrinsic grammatical complexity or by training data scale. Method: We evaluate six state-of-the-art LLMs on the multilingual Wug test across Catalan, English, Greek, and Spanish, and compare their outputs against human baseline responses. Contribution/Results: Model performance correlates strongly with digital resource density—such as corpus size and community activity—not with morphosyntactic complexity. LLMs approach human-level accuracy on high-resource languages (English, Spanish) but degrade substantially on low-resource ones (Catalan, Greek). These findings indicate that LLMs’ morphological generalization is fundamentally data-driven rather than genuinely grammar-sensitive, challenging the conventional assumption that structural complexity determines generalization difficulty. The results provide novel empirical grounding for multilingual model evaluation and for optimizing LLMs in data-scarce settings.

Technology Category

Application Category

📝 Abstract
The linguistic abilities of Large Language Models are a matter of ongoing debate. This study contributes to this discussion by investigating model performance in a morphological generalization task that involves novel words. Using a multilingual adaptation of the Wug Test, six models were tested across four partially unrelated languages (Catalan, English, Greek, and Spanish) and compared with human speakers. The aim is to determine whether model accuracy approximates human competence and whether it is shaped primarily by linguistic complexity or by the quantity of available training data. Consistent with previous research, the results show that the models are able to generalize morphological processes to unseen words with human-like accuracy. However, accuracy patterns align more closely with community size and data availability than with structural complexity, refining earlier claims in the literature. In particular, languages with larger speaker communities and stronger digital representation, such as Spanish and English, revealed higher accuracy than less-resourced ones like Catalan and Greek. Overall, our findings suggest that model behavior is mainly driven by the richness of linguistic resources rather than by sensitivity to grammatical complexity, reflecting a form of performance that resembles human linguistic competence only superficially.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' morphological generalization ability using novel words
Comparing model accuracy with human performance across four languages
Assessing whether data quantity or linguistic complexity drives model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models generalize morphology with human-like accuracy
Accuracy depends on speaker community size and data
Performance driven by resource richness, not grammatical complexity
🔎 Similar Papers
No similar papers found.
N
Nikoleta Pantelidou
Universitat Autònoma de Barcelona
Evelina Leivada
Evelina Leivada
Research Professor at ICREA & Universitat Autònoma de Barcelona
BilingualismLanguage VariationLanguage AcquisitionMorphosyntax
P
Paolo Morosi
Universitat Autònoma de Barcelona