Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Subjective bias arising from inter-rater scale variability hinders performance in speech quality assessment (SQA) and continuous speech emotion recognition (CSER). Method: We propose a unified listener-scale modeling framework based on pairwise comparison learning, which abandons conventional mean aggregation or per-rater scale modeling. Instead, it explicitly learns a shared scale representation by leveraging ordinal relationships among sentence-level relative ratings, thereby enforcing cross-rater comparability. The model is trained end-to-end via comparison learning to preserve subjectivity while mitigating scale-induced bias. Contribution/Results: The approach significantly improves generalization across listeners and tasks. Experiments demonstrate state-of-the-art performance on both SQA and CSER benchmark datasets, validating its effectiveness, robustness, and task-agnostic applicability.

Technology Category

Application Category

📝 Abstract
Speech Quality Assessment (SQA) and Continuous Speech Emotion Recognition (CSER) are two key tasks in speech technology, both relying on listener ratings. However, these ratings are inherently biased due to individual listener factors. Previous approaches have introduced a mean listener scoring scale and modeled all listener scoring scales in the training set. However, the mean listener approach is prone to distortion from averaging ordinal data, leading to potential biases. Moreover, learning multiple listener scoring scales while inferring based only on the mean listener scale limits effectiveness. In contrast, our method focuses on modeling a unified listener scoring scale, using comparison scores to correctly capture the scoring relationships between utterances. Experimental results show that our method effectively improves prediction performance in both SQA and CSER tasks, proving its effectiveness and robustness.
Problem

Research questions and friction points this paper is trying to address.

Unifying biased listener scoring scales in speech assessment
Improving accuracy in Speech Quality Assessment tasks
Enhancing performance in Continuous Speech Emotion Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified listener scoring scale modeling
Comparison scores for scoring relationships
Improved prediction in SQA and CSER
🔎 Similar Papers
Cheng-Hung Hu
Cheng-Hung Hu
Academia Sinica
Y
Yusuke Yasuda
Digital Content and Media Sciences Research Division, National Institute of Informatics, Japan
A
Akifumi Yoshimoto
CyberAgent, Japan
Tomoki Toda
Tomoki Toda
Nagoya University
Signal ProcessingSpeech ProcessingSpeech Synthesis