End-to-End Audio-Visual Learning for Cochlear Implant Sound Coding in Noisy Environments

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low speech intelligibility experienced by cochlear implant (CI) users in noisy and reverberant environments, this paper proposes AVSE-ECS—the first end-to-end audio-visual collaborative CI encoding system. AVSE-ECS jointly optimizes a deep audio-visual speech enhancement module with an ElectrodeNet-based coding strategy, overcoming the limitation of conventional two-stage approaches that separately train enhancement and encoding components. By integrating multimodal features and performing end-to-end optimization directly on electrode stimulation sequences, the system enhances perceptual quality under challenging acoustic conditions. Experiments demonstrate that AVSE-ECS significantly outperforms traditional electrode coding strategies (ECS) across diverse noise conditions, achieving improvements of 12.6%–18.3% in objective intelligibility metrics—including STOI and ESTOI. These results validate the efficacy of multimodal deep learning for CI encoding and underscore its potential for clinical translation.

Technology Category

Application Category

📝 Abstract
The cochlear implant (CI) is a remarkable biomedical device that successfully enables individuals with severe-to-profound hearing loss to perceive sound by converting speech into electrical stimulation signals. Despite advancements in the performance of recent CI systems, speech comprehension in noisy or reverberant conditions remains a challenge. Recent and ongoing developments in deep learning reveal promising opportunities for enhancing CI sound coding capabilities, not only through replicating traditional signal processing methods with neural networks, but also through integrating visual cues as auxiliary data for multimodal speech processing. Therefore, this paper introduces a novel noise-suppressing CI system, AVSE-ECS, which utilizes an audio-visual speech enhancement (AVSE) model as a pre-processing module for the deep-learning-based ElectrodeNet-CS (ECS) sound coding strategy. Specifically, a joint training approach is applied to model AVSE-ECS, an end-to-end CI system. Experimental results indicate that the proposed method outperforms the previous ECS strategy in noisy conditions, with improved objective speech intelligibility scores. The methods and findings in this study demonstrate the feasibility and potential of using deep learning to integrate the AVSE module into an end-to-end CI system
Problem

Research questions and friction points this paper is trying to address.

Enhancing cochlear implant speech comprehension in noisy environments
Integrating audio-visual cues for improved sound coding strategies
Developing end-to-end deep learning system for noise suppression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio-visual speech enhancement pre-processing
Joint training end-to-end cochlear implant
Deep learning integration for noise suppression
🔎 Similar Papers
No similar papers found.
M
Meng-Ping Lin
Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106319, Taiwan
E
Enoch Hsin-Ho Huang
Research Center for Information Technology Innovation, Academia Sinica, Taipei 115201, Taiwan
Shao-Yi Chien
Shao-Yi Chien
Professor of Electrical Engineering, National Taiwan University
Multimedia Signal ProcessingMultimedia System-on-a-Chip Design
Yu Tsao
Yu Tsao
Research Fellow (Professor), Deputy Director, CITI, Academia Sinica
Assistive Oral Communication TechnologiesSpeech EnhancementVoice ConversionSpeech Assessment