Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Traditional speech intelligibility assessment relies on clean reference signals, limiting its applicability in realistic scenarios—particularly for hearing-impaired listeners. To address this, we propose a reference-free, non-intrusive deep learning framework. Our method constructs robust feature representations by parallelly ensembling state-of-the-art speech enhancers (e.g., SEGAN, MetricGAN) and incorporates a two-clips data augmentation strategy to model inter-individual auditory variability, thereby improving generalization. Built upon the CPC2 benchmark architecture, the framework enables end-to-end intelligibility prediction without reference speech. Experiments demonstrate consistent performance gains over non-intrusive baselines—including the CPC2 champion—across multiple out-of-domain datasets. To our knowledge, this is the first work to validate the efficacy and superiority of an enhancer-guided paradigm for reference-free speech intelligibility estimation.

Technology Category

Application Category

📝 Abstract

Speech intelligibility evaluation for hearing-impaired (HI) listeners is essential for assessing hearing aid performance, traditionally relying on listening tests or intrusive methods like HASPI. However, these methods require clean reference signals, which are often unavailable in real-world conditions, creating a gap between lab-based and real-world assessments. To address this, we propose a non-intrusive intelligibility prediction framework that leverages speech enhancers to provide a parallel enhanced-signal pathway, enabling robust predictions without reference signals. We evaluate three state-of-the-art enhancers and demonstrate that prediction performance depends on the choice of enhancer, with ensembles of strong enhancers yielding the best results. To improve cross-dataset generalization, we introduce a 2-clips augmentation strategy that enhances listener-specific variability, boosting robustness on unseen datasets. Our approach consistently outperforms the non-intrusive baseline, CPC2 Champion across multiple datasets, highlighting the potential of enhancer-guided non-intrusive intelligibility prediction for real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Predicting speech intelligibility for hearing-impaired listeners without clean reference signals

Addressing the gap between lab-based and real-world hearing aid assessments

Improving cross-dataset generalization for robust intelligibility prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging speech enhancers for parallel enhanced-signal pathway

Using ensemble of strong enhancers for optimal prediction performance

Introducing 2-clips augmentation for cross-dataset generalization

🔎 Similar Papers

No similar papers found.