Evaluating Objective Speech Quality Metrics for Neural Audio Codecs

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Subjective listening tests remain the bottleneck for evaluating speech quality of neural audio codecs at low bitrates. Method: We systematically benchmark mainstream objective metrics—including PESQ, STOI, and DNSMOS—against human perception using standardized MUSHRA subjective test results, quantifying their correlation with mean opinion scores via Pearson’s correlation coefficient. Contribution/Results: Traditional metrics (e.g., PESQ) exhibit markedly degraded performance under neural codec distortions, whereas DNSMOS and novel time-frequency domain metrics achieve superior correlation (r > 0.85). We are the first to characterize differential sensitivity of objective metrics to neural-specific artifacts—such as spectral smearing and temporal aliasing—and to propose an empirically grounded, optimized metric combination with clearly defined applicability boundaries for neural audio codecs. This work provides evidence-based guidelines for automated, reproducible speech quality assessment in neural codec development and evaluation.

Technology Category

Application Category

📝 Abstract

Neural audio codecs have gained recent popularity for their use in generative modeling as they offer high-fidelity audio reconstruction at low bitrates. While human listening studies remain the gold standard for assessing perceptual quality, they are time-consuming and impractical. In this work, we examine the reliability of existing objective quality metrics in assessing the performance of recent neural audio codecs. To this end, we conduct a MUSHRA listening test on high-fidelity speech signals and analyze the correlation between subjective scores and widely used objective metrics. Our results show that, while some metrics align well with human perception, others struggle to capture relevant distortions. Our findings provide practical guidance for selecting appropriate evaluation metrics when using neural audio codecs for speech.

Problem

Research questions and friction points this paper is trying to address.

Evaluating objective metrics for neural audio codecs

Assessing correlation between subjective and objective quality scores

Providing guidance for selecting perceptual evaluation metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating objective metrics for neural codecs

Conducting MUSHRA listening tests on speech

Analyzing correlation between subjective and objective scores

🔎 Similar Papers

No similar papers found.