Universal Preference-Score-based Pairwise Speech Quality Assessment

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenge of reliably comparing speech generation systems’ performance. We propose a preference-score-driven pairwise speech quality assessment method: first, a deep regression model predicts absolute Mean Opinion Score (MOS) values for two speech samples; then, a differentiable preference function aggregates these absolute scores into a relative preference score. Our key contribution is the first principled decoupling and joint modeling of absolute quality estimation and relative preference learning. To support this, we construct the first large-scale, MOS-synthesized pairwise speech preference dataset, significantly enhancing generalization in low-data regimes. By integrating data distillation and synthetic data augmentation, our method consistently outperforms baselines across diverse training configurations and cross-domain evaluations, achieving up to a 12.7% improvement in preference prediction accuracy. Results demonstrate strong robustness and broad applicability.

Technology Category

Application Category

📝 Abstract

To compare the performance of two speech generation systems, one of the most effective approaches is estimating the preference score between their generated speech. This paper proposes a novel universal preference-score-based pairwise speech quality assessment (UPPSQA) model, aimed at predicting the preference score between paired speech samples to determine which one has better quality. The model first predicts the absolute mean opinion score (MOS) for the two speech samples separately, and then aggregates them into a relative preference score using a preference function. To address the scarcity of preference data, we also construct a new pairwise speech dataset based on a MOS dataset for experiments. Experimental results confirm that, whether in training scenarios with different data types and label conditions, or in both in-domain and out-of-domain test scenarios, the prediction accuracy of UPP-SQA outperforms that of the baseline models, demonstrating its universality.

Problem

Research questions and friction points this paper is trying to address.

Predicts preference score between paired speech samples

Addresses scarcity of preference data with new dataset

Outperforms baselines in diverse training and test scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts preference score between paired speech samples

Uses absolute MOS and preference function aggregation

Constructs new pairwise dataset from MOS data

🔎 Similar Papers

SCOREQ: Speech Quality Assessment with Contrastive Regression