Interference Matrix: Quantifying Cross-Lingual Interference in Transformer Encoders

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically investigates cross-lingual interference in Transformer encoders under multilingual settings. We construct a fine-grained cross-lingual interference matrix across 83 languages, quantifying the asymmetry in performance transfer between language pairs. Contrary to expectations, interference patterns exhibit weak correlation with conventional linguistic proxies—such as language family or embedding similarity—but are strongly governed by writing systems (scripts). Moreover, the interference matrix effectively predicts downstream task performance. Methodologically, we train and evaluate numerous lightweight BERT-like models on all language-pair combinations and conduct multi-dimensional linguistic correlation analysis. Our study is the first to reveal the fundamental script dependence of cross-lingual interference, providing interpretable, empirically grounded insights for multilingual model architecture design, pretraining strategy optimization, and principled language selection.

Technology Category

Application Category

📝 Abstract
In this paper, we present a comprehensive study of language interference in encoder-only Transformer models across 83 languages. We construct an interference matrix by training and evaluating small BERT-like models on all possible language pairs, providing a large-scale quantification of cross-lingual interference. Our analysis reveals that interference between languages is asymmetrical and that its patterns do not align with traditional linguistic characteristics, such as language family, nor with proxies like embedding similarity, but instead better relate to script. Finally, we demonstrate that the interference matrix effectively predicts performance on downstream tasks, serving as a tool to better design multilingual models to obtain optimal performance.
Problem

Research questions and friction points this paper is trying to address.

Quantify cross-lingual interference in Transformer encoders
Analyze asymmetrical interference patterns unrelated to linguistic families
Predict downstream task performance using interference matrix
Innovation

Methods, ideas, or system contributions that make the work stand out.

Construct interference matrix for 83 languages
Analyze asymmetrical cross-lingual interference patterns
Use interference matrix for downstream task prediction
🔎 Similar Papers
No similar papers found.