Is Cross-Lingual Transfer in Bilingual Models Human-Like? A Study with Overlapping Word Forms in Dutch and English

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether bilingual language models replicate the cross-linguistic activation patterns observed in human bilinguals during reading, specifically in processing cognates and interlingual homographs. By training Dutch–English causal Transformer models under four distinct lexical sharing conditions and evaluating them with psycholinguistic stimulus materials, the authors employ surprisal and embedding similarity analyses for systematic assessment. The work reveals, for the first time, that qualitative replication of human bilingual processing effects occurs only when models share embeddings for cognates—effectively breaking strict language separation. This cross-linguistic activation is primarily driven by word frequency, underscoring the critical role of vocabulary design in shaping bilingual behavior in computational models.
📝 Abstract
Bilingual speakers show cross-lingual activation during reading, especially for words with shared surface form. Cognates (friends) typically lead to facilitation, whereas interlingual homographs (false friends) cause interference or no effect. We examine whether cross-lingual activation in bilingual language models mirrors these patterns. We train Dutch-English causal Transformers under four vocabulary-sharing conditions that manipulate whether (false) friends receive shared or language-specific embeddings. Using psycholinguistic stimuli from bilingual reading studies, we evaluate the models through surprisal and embedding similarity analyses. The models largely maintain language separation, and cross-lingual effects arise primarily when embeddings are shared. In these cases, both friends and false friends show facilitation relative to controls. Regression analyses reveal that these effects are mainly driven by frequency rather than consistency in form-meaning mapping. Only when just friends share embeddings are the qualitative patterns of bilinguals reproduced. Overall, bilingual language models capture some cross-linguistic activation effects. However, their alignment with human processing seems to critically depend on how lexical overlap is encoded, possibly limiting their explanatory adequacy as models of bilingual reading.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual transfer
bilingual models
cognates
interlingual homographs
human-like processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual transfer
bilingual language models
cognates
interlingual homographs
embedding sharing
🔎 Similar Papers
No similar papers found.
I
Iza Škrjanec
Saarland University, Germany; Zuse School ELIZA, Germany
I
Irene Elisabeth Winther
Radboud University, the Netherlands
Vera Demberg
Vera Demberg
Saarland University, MPI for Informatics, Saarland Informatics Campus
Computational LinguisticsPsycholinguisticsComputer ScienceNatural Language ProcessingML
S
Stefan L. Frank
Radboud University, the Netherlands