Universal Preference-Score-based Pairwise Speech Quality Assessment

πŸ“… 2025-06-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of reliably comparing speech generation systems’ performance. We propose a preference-score-driven pairwise speech quality assessment method: first, a deep regression model predicts absolute Mean Opinion Score (MOS) values for two speech samples; then, a differentiable preference function aggregates these absolute scores into a relative preference score. Our key contribution is the first principled decoupling and joint modeling of absolute quality estimation and relative preference learning. To support this, we construct the first large-scale, MOS-synthesized pairwise speech preference dataset, significantly enhancing generalization in low-data regimes. By integrating data distillation and synthetic data augmentation, our method consistently outperforms baselines across diverse training configurations and cross-domain evaluations, achieving up to a 12.7% improvement in preference prediction accuracy. Results demonstrate strong robustness and broad applicability.

Technology Category

Application Category

πŸ“ Abstract
To compare the performance of two speech generation systems, one of the most effective approaches is estimating the preference score between their generated speech. This paper proposes a novel universal preference-score-based pairwise speech quality assessment (UPPSQA) model, aimed at predicting the preference score between paired speech samples to determine which one has better quality. The model first predicts the absolute mean opinion score (MOS) for the two speech samples separately, and then aggregates them into a relative preference score using a preference function. To address the scarcity of preference data, we also construct a new pairwise speech dataset based on a MOS dataset for experiments. Experimental results confirm that, whether in training scenarios with different data types and label conditions, or in both in-domain and out-of-domain test scenarios, the prediction accuracy of UPP-SQA outperforms that of the baseline models, demonstrating its universality.
Problem

Research questions and friction points this paper is trying to address.

Predicts preference score between paired speech samples
Addresses scarcity of preference data with new dataset
Outperforms baselines in diverse training and test scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts preference score between paired speech samples
Uses absolute MOS and preference function aggregation
Constructs new pairwise dataset from MOS data
πŸ”Ž Similar Papers
No similar papers found.
Y
Yu-Fei Shi
National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China
Yang Ai
Yang Ai
Associate Researcher, University of Science and Technology of China
Speech SynthesisSpeech EnhancementSpeech CodingDeep Learning
Z
Zhenhua Ling
National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China