DeepGESI: A Non-Intrusive Objective Evaluation Model for Predicting Speech Intelligibility in Hearing-Impaired Listeners

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Current objective assessment of speech intelligibility for hearing-impaired listeners relies heavily on clean reference signals—a major bottleneck in clinical and hearing-aid fitting scenarios. To address this, we propose DeepGESI, the first fully non-intrusive deep learning model for reference-free prediction of the hearing-loss-specific metric GESI (Generalized Estimation of Speech Intelligibility). DeepGESI takes only distorted speech as input and performs end-to-end regression, jointly modeling time-frequency acoustic representations and hearing-loss perception priors. Unlike conventional reference-dependent methods, DeepGESI enables pure no-reference GESI estimation, significantly enhancing practicality in real-world applications. Evaluated on the CPC2 dataset, it achieves high correlation with human-rated GESI (Spearman ρ > 0.92) and accelerates inference by over 20× compared to prior approaches. This work establishes a new paradigm for objective, efficient, and personalized speech intelligibility assessment tailored to hearing impairment.

Technology Category

Application Category

📝 Abstract

Speech intelligibility assessment is essential for many speech-related applications. However, most objective intelligibility metrics are intrusive, as they require clean reference speech in addition to the degraded or processed signal for evaluation. Furthermore, existing metrics such as STOI are primarily designed for normal hearing listeners, and their predictive accuracy for hearing impaired speech intelligibility remains limited. On the other hand, the GESI (Gammachirp Envelope Similarity Index) can be used to estimate intelligibility for hearing-impaired listeners, but it is also intrusive, as it depends on reference signals. This requirement limits its applicability in real-world scenarios. To overcome this limitation, this study proposes DeepGESI, a non-intrusive deep learning-based model capable of accurately and efficiently predicting the speech intelligibility of hearing-impaired listeners without requiring any clean reference speech. Experimental results demonstrate that, under the test conditions of the 2nd Clarity Prediction Challenge(CPC2) dataset, the GESI scores predicted by DeepGESI exhibit a strong correlation with the actual GESI scores. In addition, the proposed model achieves a substantially faster prediction speed compared to conventional methods.

Problem

Research questions and friction points this paper is trying to address.

Develops a non-intrusive model to predict speech intelligibility for hearing-impaired listeners

Eliminates the need for clean reference speech in objective intelligibility assessment

Improves prediction accuracy and speed over existing intrusive metrics like GESI and STOI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning model predicts intelligibility without reference speech

Non-intrusive evaluation for hearing-impaired listeners using deep learning

Faster prediction speed compared to conventional objective metrics

🔎 Similar Papers

No similar papers found.