🤖 AI Summary
This work addresses the susceptibility of multilingual large language models (LLMs) to "translationese" bias when serving as evaluators, particularly their tendency to favor machine-translated text over human-written content in low-resource languages. To mitigate this systematic bias, the authors propose DIBJudge, a novel framework that introduces the disentangled information bottleneck to multilingual evaluation for the first time. By leveraging variational information compression, DIBJudge learns minimal sufficient representations for judgment while explicitly isolating spurious correlations responsible for bias into a separate branch. A cross-covariance penalty further enforces effective disentanglement between robust and biased features. Experimental results demonstrate that DIBJudge significantly outperforms strong baselines on both multilingual reward modeling benchmarks and dedicated translationese bias evaluation suites, effectively alleviating such systemic biases.
📝 Abstract
Large language models (LLMs) have become a standard for multilingual evaluation, yet they exhibit a severe systematic translationese bias. In this paper, translationese bias is characterized as LLMs systematically favoring machine-translated text over human-authored references, particularly in low-resource languages. We attribute this bias to spurious correlations with (i) latent manifold alignment with English and (ii) cross-lingual predictability. To mitigate this bias, we propose DIBJudge, a robust fine-tuning framework that learns a minimally sufficient, judgment-critical representation via variational information compression, while explicitly isolating spurious factors into the dedicated bias branch. Furthermore, we incorporate a cross-covariance penalty that explicitly suppresses statistical dependence between robust and bias representations, thereby encouraging effective disentanglement. Extensive evaluations on multilingual reward modeling benchmarks and a dedicated translationese bias evaluation suite demonstrate that the proposed DIBJudge consistently outperforms strong baselines and substantially mitigates translationese bias.