Evaluating Extremely Low-Resource Machine Translation: A Comparative Study of ChrF++ and BLEU Metrics

πŸ“… 2026-02-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the limitations of conventional BLEU scores in evaluating machine translation quality for extremely low-resource languages such as Magahi, Bhojpuri, and Chhattisgarhi. Through systematic empirical analysis of outputs from both neural machine translation (NMT) systems and large language models (LLMs), the work investigates the sensitivity of character-level metric ChrF++ and n-gram–based BLEU to common pathologies including hallucination, repetition, source copying, and diacritic variation. The findings reveal that although BLEU tends to yield lower scores in low-resource settings, its capacity to capture lexical precision effectively complements the strengths of ChrF++. Jointly leveraging both metrics substantially enhances the comprehensiveness and interpretability of translation quality assessment, offering a novel paradigm for evaluation in low-resource scenarios.

Technology Category

Application Category

πŸ“ Abstract
Evaluating machine translation (MT) quality in extremely low-resource language (ELRL) scenarios poses unique challenges, as widely used metrics such as BLEU, effective in high-resource settings, often misrepresent quality in data-scarce contexts. This work presents a comparative analysis of BLEU, an n-gram-based metric, and ChrF++, a character-based metric, for MT evaluation in ELRL settings. We examine how each metric responds to translation artifacts, including hallucinations, repetition, source-text copying, and diacritic (\textit{matra}) variations across three ELRLs: Magahi, Bhojpuri, and Chhattisgarhi, with a focus on outputs from large language models (LLMs) and neural MT (NMT) systems. While recent work often relies solely on ChrF++, our findings show that BLEU, despite its lower absolute scores, provides complementary lexical-precision insights that improve interpretability.
Problem

Research questions and friction points this paper is trying to address.

extremely low-resource machine translation
MT evaluation
BLEU
ChrF++
translation artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

extremely low-resource machine translation
BLEU
ChrF++
evaluation metrics
large language models
πŸ”Ž Similar Papers
No similar papers found.
S
Sanjeev Kumar
Dept. of CSE, IIT Bombay, Mumbai, India
Preethi Jyothi
Preethi Jyothi
Associate Professor of Computer Science and Engineering, Indian Institute of Technology Bombay
Speech RecognitionNatural Language ProcessingMachine Learning
P
Pushpak Bhattacharyya
Dept. of CSE, IIT Bombay, Mumbai, India