Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition

πŸ“… 2026-03-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the scarcity of high-quality automatic evaluation methods for large language models in non-English settings, a challenge exacerbated by the limited availability and high cost of human-annotated data in target languages. To overcome this, the paper proposes a cross-lingual evaluation framework based on evaluation decomposition. It introduces, for the first time, a language-agnostic Universal Criteria Set (UCS) that enables cross-lingual transfer without requiring human annotations in the target language by generating interpretable intermediate representations. Integrating large language model–based automatic judgment with transfer learning, the framework substantially outperforms strong baselines across multiple languages and model architectures on faithfulness evaluation tasks, while simultaneously enhancing the interpretability and generalization capability of the evaluation system.

Technology Category

Application Category

πŸ“ Abstract
As large language models are increasingly deployed across diverse real-world applications, extending automated evaluation beyond English has become a critical challenge. Existing evaluation approaches are predominantly English-focused, and adapting them to other languages is hindered by the scarcity and cost of human-annotated judgments in most languages. We introduce a decomposition-based evaluation framework built around a Universal Criteria Set (UCS). UCS consists of a shared, language-agnostic set of evaluation dimensions, producing an interpretable intermediate representation that supports cross-lingual transfer with minimal supervision. Experiments on multiple faithfulness tasks across languages and model backbones demonstrate consistent improvements over strong baselines without requiring target-language annotations.
Problem

Research questions and friction points this paper is trying to address.

Cross-Lingual Evaluation
LLM-Judge
Evaluation Transfer
Low-Resource Languages
Automated Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual transfer
evaluation decomposition
Universal Criteria Set
LLM-Judge
language-agnostic evaluation
πŸ”Ž Similar Papers
No similar papers found.