🤖 AI Summary
Conventional hearing aids rely on cascaded multi-module processing, hindering personalized, adaptive amplification. To address this, we propose NeuroAMP—the first end-to-end deep neural amplification framework that jointly models speech spectrograms and individual pure-tone audiograms for subject-specific gain compensation. Key contributions include: (1) the first end-to-end neural amplification paradigm explicitly incorporating audiometric data; (2) Denoising NeuroAMP, an integrated noise suppression–amplification architecture; and (3) a hearing-rehabilitation–oriented data augmentation strategy. Trained on multi-source datasets (TIMIT, TMHINT, Cadenza MUSIC) and evaluated across Transformer, CNN, LSTM, and CRNN backbones, NeuroAMP achieves state-of-the-art objective performance: the Transformer variant attains HASQI SRCC of 0.9927 on TIMIT; Denoising NeuroAMP outperforms NAL-R+WDRC by ~10% in both HASPI and HASQI on VoiceBank+DEMAND.
📝 Abstract
The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral features and the listener's audiogram as inputs, and we investigate four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction along with amplification capabilities for improved performance in real-world scenarios. To enhance generalization, a comprehensive data augmentation strategy was employed during training on diverse speech (TIMIT and TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) demonstrates that the Transformer architecture within NeuroAMP achieves the best performance, with SRCC scores of 0.9927 (HASQI) and 0.9905 (HASPI) on TIMIT, and 0.9738 (HAAQI) on the Cadenza Challenge MUSIC dataset. Notably, our data augmentation strategy maintains high performance on unseen datasets (e.g., VCTK, MUSDB18-HQ). Furthermore, Denoising NeuroAMP outperforms both the conventional NAL-R+WDRC approach and a two-stage baseline on the VoiceBank+DEMAND dataset, achieving a 10% improvement in both HASPI (0.90) and HASQI (0.59) scores. These results highlight the potential of NeuroAMP and Denoising NeuroAMP to deliver notable improvements in personalized hearing aid amplification.