Measuring cross-language intelligibility between Romance languages with computational tools

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This study addresses the quantification of cross-lingual intelligibility among Romance languages—French, Italian, Portuguese, Spanish, and Romanian—with a particular focus on its asymmetry. The work proposes a novel computational metric that integrates orthographic, phonetic, and multi-source semantic embeddings to jointly capture lexical surface and semantic similarity. Leveraging parallel corpora and word embedding models, the resulting intelligibility scores exhibit strong correlation with human performance in cloze-task experiments, effectively capturing and validating the asymmetric nature of mutual intelligibility across these languages. This approach offers a scalable, data-driven paradigm for modeling cross-lingual comprehension, advancing both theoretical understanding and practical applications in multilingual NLP.

Technology Category

Application Category

📝 Abstract

We present an analysis of mutual intelligibility in related languages applied for languages in the Romance family. We introduce a novel computational metric for estimating intelligibility based on lexical similarity using surface and semantic similarity of related words, and use it to measure mutual intelligibility for the five main Romance languages (French, Italian, Portuguese, Spanish, and Romanian), and compare results using both the orthographic and phonetic forms of words as well as different parallel corpora and vectorial models of word meaning representation. The obtained intelligibility scores confirm intuitions related to intelligibility asymmetry across languages and significantly correlate with results of cloze tests in human experiments.

Problem

Research questions and friction points this paper is trying to address.

cross-language intelligibility

Romance languages

lexical similarity

mutual intelligibility

computational linguistics

Innovation

Methods, ideas, or system contributions that make the work stand out.

computational metric

mutual intelligibility

lexical similarity