🤖 AI Summary
This paper addresses the lack of quantitative, graded, and universally applicable metrics for measuring “translationese” in translated texts. We propose T-index, the first fully quantifiable metric for translationese assessment. T-index leverages a pair of contrastively fine-tuned language models (0.5B parameters each) and computes the log-likelihood ratio between synthetic and authentic translation data to enable continuous, cross-domain evaluation of translationese intensity. It captures both relative differences and absolute severity, exhibits low correlation with mainstream machine translation automatic metrics (e.g., BLEU, COMET), and thus provides independent, complementary value. Remarkably, T-index achieves effective modeling with only 1–5k synthetic samples. Experiments demonstrate strong agreement with human judgments (Pearson’s r = 0.568) and robust performance across diverse domains.
📝 Abstract
In this paper, we propose the first quantitative measure for translationese -- the translationese-index (T-index) for graded and generalizable measurement of translationese, computed from the likelihood ratios of two contrastively fine-tuned language models (LMs). We use a synthesized dataset and a dataset with translations in the wild to evaluate T-index's generalizability in cross-domain settings and its validity against human judgments. Our results show that T-index is both robust and efficient. T-index scored by two 0.5B LMs fine-tuned on only 1-5k pairs of synthetic data can well capture translationese in the wild. We find that the relative differences in T-indices between translations can well predict pairwise translationese annotations obtained from human annotators; and the absolute values of T-indices correlate well with human ratings of degrees of translationese (Pearson's $r = 0.568$). Additionally, the correlation between T-index and existing machine translation (MT) quality estimation (QE) metrics such as BLEU and COMET is low, suggesting that T-index is not covered by these metrics and can serve as a complementary metric in MT QE.