URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-cost, labor-intensive subjective MOS scoring hinders scalable evaluation of speech enhancement systems. To address this, we propose URGENT-PK—a lightweight pairwise ranking model that learns relative perceptual quality order from same-source enhanced speech pairs. Leveraging pretrained features (e.g., DNSMOS) and a compact deep neural network, URGENT-PK efficiently generates abundant training pairs from limited annotated data, significantly improving generalization and system-level ranking consistency. Compared to state-of-the-art methods, URGENT-PK achieves superior system-level ranking performance across multiple public benchmarks (e.g., DNS Challenge, VoiceBank+DEMAND), while maintaining robustness with minimal annotation effort and a simple network architecture. This enables reliable, scalable speech quality assessment under low-resource conditions, establishing a new paradigm for efficient perceptual evaluation.

Technology Category

Application Category

📝 Abstract
The Mean Opinion Score (MOS) is fundamental to speech quality assessment. However, its acquisition requires significant human annotation. Although deep neural network approaches, such as DNSMOS and UTMOS, have been developed to predict MOS to avoid this issue, they often suffer from insufficient training data. Recognizing that the comparison of speech enhancement (SE) systems prioritizes a reliable system comparison over absolute scores, we propose URGENT-PK, a novel ranking approach leveraging pairwise comparisons. URGENT-PK takes homologous enhanced speech pairs as input to predict relative quality rankings. This pairwise paradigm efficiently utilizes limited training data, as all pairwise permutations of multiple systems constitute a training instance. Experiments across multiple open test sets demonstrate URGENT-PK's superior system-level ranking performance over state-of-the-art baselines, despite its simple network architecture and limited training data.
Problem

Research questions and friction points this paper is trying to address.

Predicting speech quality rankings with limited training data
Overcoming human annotation dependency for MOS assessment
Enhancing system-level comparison accuracy in speech enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pairwise comparisons for ranking speech quality
Homologous speech pairs as input data
Efficient use of limited training data
🔎 Similar Papers
No similar papers found.