Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work proposes a non-intrusive speech quality assessment method leveraging large language models (LLMs) in data-scarce scenarios. It introduces the novel use of an LLM as a meta-evaluator that, through few-shot in-context learning, fuses lightweight acoustic descriptors with pseudo-labels generated by existing models such as DNSMOS and VQScore to predict perceptual Mean Opinion Scores (MOS). The key innovation lies in the LLM’s ability to aggregate heterogeneous quality signals and integrate a pseudo-label-guided mechanism, substantially enhancing assessment performance under low-resource conditions. Experimental results on the VoiceBank-DEMAND dataset demonstrate that the proposed approach outperforms current state-of-the-art models—including DNSMOS, VQScore, CNN-BLSTM, and MOS-SSL—thereby validating the efficacy of LLM-driven multi-source signal fusion for speech quality evaluation.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce GatherMOS, a novel framework that leverages large language models (LLM) as meta-evaluators to aggregate diverse signals into quality predictions. GatherMOS integrates lightweight acoustic descriptors with pseudo-labels from DNSMOS and VQScore, enabling the LLM to reason over heterogeneous inputs and infer perceptual mean opinion scores (MOS). We further explore both zero-shot and few-shot in-context learning setups, showing that zero-shot GatherMOS maintains stable performance across diverse conditions, while few-shot guidance yields large gains when support samples match the test conditions. Experiments on the VoiceBank-DEMAND dataset demonstrate that GatherMOS consistently outperforms DNSMOS, VQScore, naive score averaging, and even learning-based models such as CNN-BLSTM and MOS-SSL when trained under limited labeled-data conditions. These results highlight the potential of LLM-based aggregation as a practical strategy for non-intrusive speech quality evaluation.
Problem

Research questions and friction points this paper is trying to address.

speech quality evaluation
few-shot learning
pseudo-label
large language models
non-intrusive
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models
speech quality evaluation
few-shot learning
pseudo-labeling
non-intrusive MOS prediction