MTQ-Eval: Multilingual Text Quality Evaluation for Language Models

📅 2025-11-12

📈 Citations: 1

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the limited cross-lingual generalization capability of large language models (LLMs) in universal text quality assessment. To this end, we propose the first preference-learning-based multilingual text quality evaluation framework. It automatically generates high-quality preference data spanning 115 languages and trains open-source LLMs via supervised fine-tuning coupled with representation alignment—eliminating the need for human annotation. The framework significantly enhances the model’s ability to discriminate text quality across diverse languages and downstream tasks. Empirical results demonstrate substantial improvements over state-of-the-art methods on multilingual quality evaluation benchmarks. Moreover, it consistently boosts performance in key downstream applications, including machine translation and abstractive summarization. Our approach establishes a novel, scalable paradigm for universal multilingual text quality assessment.

Technology Category

Application Category

📝 Abstract

The use of large language models (LLMs) for evaluating outputs is becoming an increasingly effective and scalable approach. However, it remains uncertain whether this capability extends beyond task-specific evaluations to more general assessments of text quality, particularly in multilingual contexts. In this study, we introduce, MTQ-Eval, a novel framework for multilingual text quality evaluation that learns from examples of both high- and low-quality texts, adjusting its internal representations. To develop MTQ-Eval, we first automatically generate text quality preference data and then use it to train open-source base LLMs to align with ratings of high- and low-quality text. Our comprehensive evaluation across 115 languages demonstrates the improved performance of the proposed model. Upon further analysis, we find that this enhanced evaluation capability also leads to notable improvements in downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating text quality across multiple languages using LLMs

Assessing general text quality beyond task-specific evaluations

Developing multilingual evaluation framework with quality-adjusted representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns from high- and low-quality text examples

Automatically generates text quality preference data

Trains open-source LLMs to align with ratings

🔎 Similar Papers

No similar papers found.