Enhanced Generative Machine Listener

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the limited accuracy of subjective audio quality prediction—particularly for MUSHRA scores—by proposing a generative learning framework based on a Beta-distribution loss function. Methodologically, it integrates multi-source neural audio coding (NAC) subjective rating datasets to explicitly model the bounded continuous nature of MUSHRA scores, thereby enhancing generalization across diverse audio content and codec configurations. Its key contribution is the first explicit incorporation of the Beta distribution into audio quality regression loss, coupled with large-scale subjective-data-driven training to achieve high-fidelity modeling of human auditory perception. Experiments demonstrate that the proposed method significantly outperforms conventional objective metrics—including PEAQ and ViSQOL—across multiple benchmark datasets. It achieves state-of-the-art performance in both Pearson correlation coefficient and cross-configuration stability, establishing a more reliable and broadly applicable paradigm for perceptual quality prediction in audio codec automation.

Technology Category

Application Category

📝 Abstract

We present GMLv2, a reference-based model designed for the prediction of subjective audio quality as measured by MUSHRA scores. GMLv2 introduces a Beta distribution-based loss to model the listener ratings and incorporates additional neural audio coding (NAC) subjective datasets to extend its generalization and applicability. Extensive evaluations on diverse testset demonstrate that proposed GMLv2 consistently outperforms widely used metrics, such as PEAQ and ViSQOL, both in terms of correlation with subjective scores and in reliably predicting these scores across diverse content types and codec configurations. Consequently, GMLv2 offers a scalable and automated framework for perceptual audio quality evaluation, poised to accelerate research and development in modern audio coding technologies.

Problem

Research questions and friction points this paper is trying to address.

Predicts subjective audio quality using MUSHRA scores

Models listener ratings with Beta distribution-based loss

Outperforms PEAQ and ViSQOL in correlation and reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Beta distribution-based loss for listener ratings

Incorporates neural audio coding subjective datasets

Automated framework for perceptual audio quality evaluation

🔎 Similar Papers

No similar papers found.

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Research Scientist Intern, Multimodal AI (PhD)