Form and Meaning in Intrinsic Multilingual Evaluations

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of theoretical grounding for the cross-lingual comparability of existing intrinsic evaluation metrics—such as perplexity—on multilingual parallel sentences, highlighting how their implicit assumption of “semantic equivalence” overlooks the tension between linguistic form and meaning. The authors present the first systematic analysis of six widely used intrinsic metrics across two multilingual parallel corpora, comparing evaluation outcomes between monolingual and multilingual language models. Their findings reveal that form–meaning mismatches significantly undermine the cross-lingual comparability of these metrics. The work identifies a fundamental limitation in current evaluation paradigms and offers a novel theoretical explanation rooted in the interplay between form and meaning, thereby calling for a critical re-evaluation of assessment standards for multilingual language models.

Technology Category

Application Category

📝 Abstract
Intrinsic evaluation metrics for conditional language models, such as perplexity or bits-per-character, are widely used in both mono- and multilingual settings. These metrics are rather straightforward to use and compare in monolingual setups, but rest on a number of assumptions in multilingual setups. One such assumption is that comparing the perplexity of CLMs on parallel sentences is indicative of their quality since the information content (here understood as the semantic meaning) is the same. However, the metrics are inherently measuring information content in the information-theoretic sense. We make this and other such assumptions explicit and discuss their implications. We perform experiments with six metrics on two multi-parallel corpora both with mono- and multilingual models. Ultimately, we find that current metrics are not universally comparable. We look at the form-meaning debate to provide some explanation for this.
Problem

Research questions and friction points this paper is trying to address.

intrinsic evaluation
multilingual language models
perplexity
form-meaning distinction
parallel corpora
Innovation

Methods, ideas, or system contributions that make the work stand out.

intrinsic evaluation
multilingual language models
perplexity
form-meaning distinction
cross-lingual comparability
🔎 Similar Papers
No similar papers found.