HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

📅 2024-01-02

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This study addresses the challenges of subjective music quality assessment for hearing aid users and the limitations of existing models—namely, reliance on reference signals and high computational cost—by proposing the first non-intrusive, end-to-end music quality assessment model. Methodologically, it integrates BEATs pretrained audio representations with an attention-augmented bidirectional LSTM; introduces the first cross-metric transfer from Hearing Aid Audio Quality Index (HAAQI) to Mean Opinion Score (MOS); and incorporates lightweight knowledge distillation (reducing parameters by 75.85% and accelerating inference by 96.46%) alongside SPL-robust modeling. Experiments show the model achieves HAAQI prediction with LCC = 0.9368, SRCC = 0.9486, and MSE = 0.0064; after distillation, LCC remains at 0.9071. MOS prediction accuracy is significantly improved, with optimal performance attained at 65 dB SPL.

Technology Category

Application Category

📝 Abstract

This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI) that require intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. By utilizing a Bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features extracted from the pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. Experimental results demonstrate HAAQI-Net's effectiveness, achieving a Linear Correlation Coefficient (LCC) of 0.9368 , a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486 , and a Mean Squared Error (MSE) of 0.0064 and inference time significantly reduces from 62.52 to 2.54 seconds. To address computational overhead, a knowledge distillation strategy was applied, reducing parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071 , SRCC: 0.9307 , MSE: 0.0091 ). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores like the Mean Opinion Score (MOS) through fine-tuning. This adaptation significantly improved prediction accuracy, validated through statistical analysis. Furthermore, the robustness of HAAQI-Net was evaluated under varying Sound Pressure Level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a scalable solution for music audio quality assessment in hearing aid applications, contributing to efficient and accurate models in audio signal processing and hearing aid technology.

Problem

Research questions and friction points this paper is trying to address.

Intelligent Model

Music Sound Quality

Hearing Aid

Innovation

Methods, ideas, or system contributions that make the work stand out.

HAAQI-Net

Knowledge Distillation

Music Quality Assessment

🔎 Similar Papers

No similar papers found.

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Master Thesis Data-Efficient Hybrid Machine Learning for Robust Vibration System Prediction

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal AI (PhD)