Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the scarcity of high-quality automatic evaluation methods for large language models in non-English settings, a challenge exacerbated by the limited availability and high cost of human-annotated data in target languages. To overcome this, the paper proposes a cross-lingual evaluation framework based on evaluation decomposition. It introduces, for the first time, a language-agnostic Universal Criteria Set (UCS) that enables cross-lingual transfer without requiring human annotations in the target language by generating interpretable intermediate representations. Integrating large language model–based automatic judgment with transfer learning, the framework substantially outperforms strong baselines across multiple languages and model architectures on faithfulness evaluation tasks, while simultaneously enhancing the interpretability and generalization capability of the evaluation system.

Technology Category

Application Category

📝 Abstract

As large language models are increasingly deployed across diverse real-world applications, extending automated evaluation beyond English has become a critical challenge. Existing evaluation approaches are predominantly English-focused, and adapting them to other languages is hindered by the scarcity and cost of human-annotated judgments in most languages. We introduce a decomposition-based evaluation framework built around a Universal Criteria Set (UCS). UCS consists of a shared, language-agnostic set of evaluation dimensions, producing an interpretable intermediate representation that supports cross-lingual transfer with minimal supervision. Experiments on multiple faithfulness tasks across languages and model backbones demonstrate consistent improvements over strong baselines without requiring target-language annotations.

Problem

Research questions and friction points this paper is trying to address.

Cross-Lingual Evaluation

LLM-Judge

Evaluation Transfer

Low-Resource Languages

Automated Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual transfer

evaluation decomposition

Universal Criteria Set