Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cochlear implant (CI) users exhibit significantly degraded speech separation performance in realistic reverberant environments. Method: This paper proposes a robust speech separation method that synergistically integrates implicit and explicit spatial cues. We first systematically quantify the impact of real-world acoustic conditions on CI-based speech separation. To enhance spatial awareness—particularly under implicit-cue limitations (e.g., unilateral CI)—we design an explicit spatial feature auxiliary mechanism incorporating sound source localization, interaural time differences (ITDs), and interaural level differences (ILDs). Our deep learning architecture jointly optimizes end-to-end implicit representations with physically interpretable explicit features. Results: The method substantially improves separation quality in challenging scenarios involving adjacent speakers and spatially overlapping sources: it achieves a 2.3 dB improvement in SI-SNRi for single-channel CI input and notably enhances discriminability of phonetically similar speech.

Technology Category

Application Category

📝 Abstract
Speech separation approaches for single-channel, dry speech mixtures have significantly improved. However, real-world spatial and reverberant acoustic environments remain challenging, limiting the effectiveness of these approaches for assistive hearing devices like cochlear implants (CIs). To address this, we quantify the impact of real-world acoustic scenes on speech separation and explore how spatial cues can enhance separation quality efficiently. We analyze performance based on implicit spatial cues (inherent in the acoustic input and learned by the model) and explicit spatial cues (manually calculated spatial features added as auxiliary inputs). Our findings show that spatial cues (both implicit and explicit) improve separation for mixtures with spatially separated and nearby talkers. Furthermore, spatial cues enhance separation when spectral cues are ambiguous, such as when voices are similar. Explicit spatial cues are particularly beneficial when implicit spatial cues are weak. For instance, single CI microphone recordings provide weaker implicit spatial cues than bilateral CIs, but even single CIs benefit from explicit cues. These results emphasize the importance of training models on real-world data to improve generalizability in everyday listening scenarios. Additionally, our statistical analyses offer insights into how data properties influence model performance, supporting the development of efficient speech separation approaches for CIs and other assistive devices in real-world settings.
Problem

Research questions and friction points this paper is trying to address.

Auditory Separation
Cochlear Implants
Complex Acoustic Environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Information
Speech Separation
Real-world Data
🔎 Similar Papers
No similar papers found.
F
Feyisayo Olalere
Radboud University, The Netherlands
Kiki van der Heijden
Kiki van der Heijden
Assistant Professor Radboud University, Research Fellow Columbia University
auditory neuroscienceneuroimagingsound localizationsound encoding
C
Christiaan H. Stronks
Department of Otorhinolaryngology, Leiden University Medical Centre, The Netherlands; Leiden Institute for Brain and Cognition, Leiden, The Netherlands
J
Jeroen Briaire
Department of Otorhinolaryngology, Leiden University Medical Centre, The Netherlands
J
Johan HM Frijns
Department of Otorhinolaryngology, Leiden University Medical Centre, The Netherlands; Leiden Institute for Brain and Cognition, Leiden, The Netherlands; Department of Bioelectronics, Delft University of Technology, Delft, The Netherlands
M
M. Gerven
Radboud University, The Netherlands