A Computational Approach to Language Contact -- A Case Study of Persian

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study investigates whether intermediate representations of monolingual language models retain structural traces of historical language contact, focusing on Persian and its interactions with languages at varying degrees of contact. Employing representation probing, intermediate layer analysis, and cross-linguistic quantification of morphosyntactic features, the work systematically evaluates the distribution of encoded linguistic information across model components. The research reveals—for the first time—that language contact effects in monolingual models are both selective and structurally constrained: syntactic information remains largely unaffected by historical contact, whereas morphological features such as case and gender are significantly shaped by language-specific structures. This finding indicates that sensitivity to contact is contingent on feature type, distinguishing between syntactic universality and morphological specificity.

Technology Category

Application Category

📝 Abstract

We investigate structural traces of language contact in the intermediate representations of a monolingual language model. Focusing on Persian (Farsi) as a historically contact-rich language, we probe the representations of a Persian-trained model when exposed to languages with varying degrees and types of contact with Persian. Our methodology quantifies the amount of linguistic information encoded in intermediate representations and assesses how this information is distributed across model components for different morphosyntactic features. The results show that universal syntactic information is largely insensitive to historical contact, whereas morphological features such as Case and Gender are strongly shaped by language-specific structure, suggesting that contact effects in monolingual language models are selective and structurally constrained.

Problem

Research questions and friction points this paper is trying to address.

language contact

Persian

monolingual language model

morphosyntactic features

intermediate representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

language contact

monolingual language model

intermediate representations