Can Language Models Learn Typologically Implausible Languages?

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether language models (LMs) exhibit human-like typological learning biases—specifically, whether they systematically struggle to acquire typologically implausible languages, such as counterfactual English or Japanese variants violating the head-directionality universal. Method: We construct a high-naturalness, word-order–counterfactual artificial language dataset and conduct symmetric cross-lingual training and zero-shot transfer evaluation on Transformer-based LMs. Contribution/Results: Our experiments provide the first systematic assessment of LM sensitivity to typological acceptability. Results show that LMs converge more slowly on typologically implausible orders but ultimately achieve only marginally lower performance, revealing a weak yet measurable typological alignment bias. This finding offers novel empirical support for the hypothesis that linguistic universals arise from domain-general inductive biases inherent in learning systems. Moreover, it underscores the methodological value of artificial language experiments for dissecting the implicit linguistic inductive biases encoded in LMs.

Technology Category

Application Category

📝 Abstract

Grammatical features across human languages show intriguing correlations often attributed to learning biases in humans. However, empirical evidence has been limited to experiments with highly simplified artificial languages, and whether these correlations arise from domain-general or language-specific biases remains a matter of debate. Language models (LMs) provide an opportunity to study artificial language learning at a large scale and with a high degree of naturalism. In this paper, we begin with an in-depth discussion of how LMs allow us to better determine the role of domain-general learning biases in language universals. We then assess learnability differences for LMs resulting from typologically plausible and implausible languages closely following the word-order universals identified by linguistic typologists. We conduct a symmetrical cross-lingual study training and testing LMs on an array of highly naturalistic but counterfactual versions of the English (head-initial) and Japanese (head-final) languages. Compared to similar work, our datasets are more naturalistic and fall closer to the boundary of plausibility. Our experiments show that these LMs are often slower to learn these subtly implausible languages, while ultimately achieving similar performance on some metrics regardless of typological plausibility. These findings lend credence to the conclusion that LMs do show some typologically-aligned learning preferences, and that the typological patterns may result from, at least to some degree, domain-general learning biases.

Problem

Research questions and friction points this paper is trying to address.

Study LMs' learning of typologically implausible languages

Assess domain-general biases in language universals

Compare learnability of plausible vs. implausible languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language models study artificial languages

Cross-lingual training on typological variants

Assessing learnability of implausible languages

🔎 Similar Papers

No similar papers found.

Authors to Follow