PerQ: Efficient Evaluation of Multilingual Text Personalization Quality

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text personalization quality evaluation lacks dedicated metrics, relying predominantly on multi-LLM meta-evaluation—yet this approach suffers from model bias and prohibitive computational overhead. Method: We propose PerQ, a lightweight, multilingual-compatible automatic metric that quantifies personalization quality without human annotations. PerQ jointly models generation discrepancies across multiple large and small language models, integrating their meta-evaluative capabilities while incorporating a bias-correction mechanism. Contribution/Results: Experiments demonstrate that PerQ achieves strong agreement with human judgments across multilingual benchmarks (average Spearman’s ρ = 0.82), while reducing computational cost by 76% compared to conventional multi-LLM ensemble methods. This substantially improves evaluation efficiency and mitigates resource waste, enabling scalable, reliable, and equitable personalization assessment.

Technology Category

Application Category

📝 Abstract
Since no metrics are available to evaluate specific aspects of a text, such as its personalization quality, the researchers often rely solely on large language models to meta-evaluate such texts. Due to internal biases of individual language models, it is recommended to use multiple of them for combined evaluation, which directly increases costs of such meta-evaluation. In this paper, a computationally efficient method for evaluation of personalization quality of a given text (generated by a language model) is introduced, called PerQ. A case study of comparison of generation capabilities of large and small language models shows the usability of the proposed metric in research, effectively reducing the waste of resources.
Problem

Research questions and friction points this paper is trying to address.

Evaluating text personalization quality without dedicated metrics
Reducing costs of multilingual meta-evaluation using multiple LLMs
Providing efficient alternative to resource-intensive personalization assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

PerQ efficiently evaluates text personalization quality
Reduces reliance on multiple costly language models
Enables resource-efficient comparison of model capabilities
🔎 Similar Papers
No similar papers found.