Enhanced Generative Machine Listener

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited accuracy of subjective audio quality prediction—particularly for MUSHRA scores—by proposing a generative learning framework based on a Beta-distribution loss function. Methodologically, it integrates multi-source neural audio coding (NAC) subjective rating datasets to explicitly model the bounded continuous nature of MUSHRA scores, thereby enhancing generalization across diverse audio content and codec configurations. Its key contribution is the first explicit incorporation of the Beta distribution into audio quality regression loss, coupled with large-scale subjective-data-driven training to achieve high-fidelity modeling of human auditory perception. Experiments demonstrate that the proposed method significantly outperforms conventional objective metrics—including PEAQ and ViSQOL—across multiple benchmark datasets. It achieves state-of-the-art performance in both Pearson correlation coefficient and cross-configuration stability, establishing a more reliable and broadly applicable paradigm for perceptual quality prediction in audio codec automation.

Technology Category

Application Category

📝 Abstract
We present GMLv2, a reference-based model designed for the prediction of subjective audio quality as measured by MUSHRA scores. GMLv2 introduces a Beta distribution-based loss to model the listener ratings and incorporates additional neural audio coding (NAC) subjective datasets to extend its generalization and applicability. Extensive evaluations on diverse testset demonstrate that proposed GMLv2 consistently outperforms widely used metrics, such as PEAQ and ViSQOL, both in terms of correlation with subjective scores and in reliably predicting these scores across diverse content types and codec configurations. Consequently, GMLv2 offers a scalable and automated framework for perceptual audio quality evaluation, poised to accelerate research and development in modern audio coding technologies.
Problem

Research questions and friction points this paper is trying to address.

Predicts subjective audio quality using MUSHRA scores
Models listener ratings with Beta distribution-based loss
Outperforms PEAQ and ViSQOL in correlation and reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Beta distribution-based loss for listener ratings
Incorporates neural audio coding subjective datasets
Automated framework for perceptual audio quality evaluation
🔎 Similar Papers
No similar papers found.