Interference Matrix: Quantifying Cross-Lingual Interference in Transformer Encoders

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work systematically investigates cross-lingual interference in Transformer encoders under multilingual settings. We construct a fine-grained cross-lingual interference matrix across 83 languages, quantifying the asymmetry in performance transfer between language pairs. Contrary to expectations, interference patterns exhibit weak correlation with conventional linguistic proxies—such as language family or embedding similarity—but are strongly governed by writing systems (scripts). Moreover, the interference matrix effectively predicts downstream task performance. Methodologically, we train and evaluate numerous lightweight BERT-like models on all language-pair combinations and conduct multi-dimensional linguistic correlation analysis. Our study is the first to reveal the fundamental script dependence of cross-lingual interference, providing interpretable, empirically grounded insights for multilingual model architecture design, pretraining strategy optimization, and principled language selection.

Technology Category

Application Category

📝 Abstract

In this paper, we present a comprehensive study of language interference in encoder-only Transformer models across 83 languages. We construct an interference matrix by training and evaluating small BERT-like models on all possible language pairs, providing a large-scale quantification of cross-lingual interference. Our analysis reveals that interference between languages is asymmetrical and that its patterns do not align with traditional linguistic characteristics, such as language family, nor with proxies like embedding similarity, but instead better relate to script. Finally, we demonstrate that the interference matrix effectively predicts performance on downstream tasks, serving as a tool to better design multilingual models to obtain optimal performance.

Problem

Research questions and friction points this paper is trying to address.

Quantify cross-lingual interference in Transformer encoders

Analyze asymmetrical interference patterns unrelated to linguistic families

Predict downstream task performance using interference matrix

Innovation

Methods, ideas, or system contributions that make the work stand out.

Construct interference matrix for 83 languages

Analyze asymmetrical cross-lingual interference patterns

Use interference matrix for downstream task prediction

🔎 Similar Papers

No similar papers found.