Measuring cross-language intelligibility between Romance languages with computational tools

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the quantification of cross-lingual intelligibility among Romance languages—French, Italian, Portuguese, Spanish, and Romanian—with a particular focus on its asymmetry. The work proposes a novel computational metric that integrates orthographic, phonetic, and multi-source semantic embeddings to jointly capture lexical surface and semantic similarity. Leveraging parallel corpora and word embedding models, the resulting intelligibility scores exhibit strong correlation with human performance in cloze-task experiments, effectively capturing and validating the asymmetric nature of mutual intelligibility across these languages. This approach offers a scalable, data-driven paradigm for modeling cross-lingual comprehension, advancing both theoretical understanding and practical applications in multilingual NLP.

Technology Category

Application Category

📝 Abstract
We present an analysis of mutual intelligibility in related languages applied for languages in the Romance family. We introduce a novel computational metric for estimating intelligibility based on lexical similarity using surface and semantic similarity of related words, and use it to measure mutual intelligibility for the five main Romance languages (French, Italian, Portuguese, Spanish, and Romanian), and compare results using both the orthographic and phonetic forms of words as well as different parallel corpora and vectorial models of word meaning representation. The obtained intelligibility scores confirm intuitions related to intelligibility asymmetry across languages and significantly correlate with results of cloze tests in human experiments.
Problem

Research questions and friction points this paper is trying to address.

cross-language intelligibility
Romance languages
lexical similarity
mutual intelligibility
computational linguistics
Innovation

Methods, ideas, or system contributions that make the work stand out.

computational metric
mutual intelligibility
lexical similarity
semantic similarity
Romance languages
🔎 Similar Papers
No similar papers found.
L
Liviu P Dinu
University of Bucharest, Faculty of Mathematics and Computer Science, HLT Research Center
A
Ana Sabina Uban
University of Bucharest, Faculty of Mathematics and Computer Science, HLT Research Center
B
Bogdan Iordache
University of Bucharest, Faculty of Mathematics and Computer Science, HLT Research Center
Anca Dinu
Anca Dinu
University of Bucharest
computational linguisticsnatural language semanticsmachine learning
S
Simona Georgescu
University of Bucharest, Faculty of Foreign Languages and Literatures, HLT Research Center