Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models

๐Ÿ“… 2026-01-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes a novel multimodal speech enhancement framework based on a conditional diffusion model to address the significant performance degradation of single-channel systems in extremely noisy environments and the ongoing challenge of effectively fusing bone-conducted (BC) and air-conducted (AC) signals. For the first time, the noise-robust BC signal is incorporated as a conditioning cue to guide the diffusion process, enabling joint optimization with the AC speech signal. The proposed method achieves efficient integration of multimodal information and consistently outperforms both state-of-the-art multimodal approaches and unimodal diffusion baselines across various complex noise conditions. Experimental results validate the effectiveness and innovation of the proposed architecture in enhancing the robustness of speech enhancement systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Single-channel speech enhancement models face significant performance degradation in extremely noisy environments. While prior work has shown that complementary bone-conducted speech can guide enhancement, effective integration of this noise-immune modality remains a challenge. This paper introduces a novel multimodal speech enhancement framework that integrates bone-conduction sensors with air-conducted microphones using a conditional diffusion model. Our proposed model significantly outperforms previously established multimodal techniques and a powerful diffusion-based single-modal baseline across a wide range of acoustic conditions.
Problem

Research questions and friction points this paper is trying to address.

speech enhancement
bone conduction
multimodal
noisy environments
conditional diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

bone-conduction
multimodal speech enhancement
conditional diffusion model
noise-immune modality
speech enhancement
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Sina Khanagha
Signal Processing Group, University of Hamburg, Germany
B
Bunlong Lay
Signal Processing Group, University of Hamburg, Germany
Timo Gerkmann
Timo Gerkmann
Signal Processing, Computer Science Department, Universitรคt Hamburg, Germany
Speech EnhancementSpeech and Audio ProcessingAcoustic Signal Processing