Evaluating Extremely Low-Resource Machine Translation: A Comparative Study of ChrF++ and BLEU Metrics

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the limitations of conventional BLEU scores in evaluating machine translation quality for extremely low-resource languages such as Magahi, Bhojpuri, and Chhattisgarhi. Through systematic empirical analysis of outputs from both neural machine translation (NMT) systems and large language models (LLMs), the work investigates the sensitivity of character-level metric ChrF++ and n-gram–based BLEU to common pathologies including hallucination, repetition, source copying, and diacritic variation. The findings reveal that although BLEU tends to yield lower scores in low-resource settings, its capacity to capture lexical precision effectively complements the strengths of ChrF++. Jointly leveraging both metrics substantially enhances the comprehensiveness and interpretability of translation quality assessment, offering a novel paradigm for evaluation in low-resource scenarios.

Technology Category

Application Category

📝 Abstract

Evaluating machine translation (MT) quality in extremely low-resource language (ELRL) scenarios poses unique challenges, as widely used metrics such as BLEU, effective in high-resource settings, often misrepresent quality in data-scarce contexts. This work presents a comparative analysis of BLEU, an n-gram-based metric, and ChrF++, a character-based metric, for MT evaluation in ELRL settings. We examine how each metric responds to translation artifacts, including hallucinations, repetition, source-text copying, and diacritic (\textit{matra}) variations across three ELRLs: Magahi, Bhojpuri, and Chhattisgarhi, with a focus on outputs from large language models (LLMs) and neural MT (NMT) systems. While recent work often relies solely on ChrF++, our findings show that BLEU, despite its lower absolute scores, provides complementary lexical-precision insights that improve interpretability.

Problem

Research questions and friction points this paper is trying to address.

extremely low-resource machine translation

MT evaluation

BLEU

ChrF++

translation artifacts

Innovation

Methods, ideas, or system contributions that make the work stand out.

extremely low-resource machine translation

BLEU

ChrF++