Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models

📅 2026-01-18

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work proposes a novel multimodal speech enhancement framework based on a conditional diffusion model to address the significant performance degradation of single-channel systems in extremely noisy environments and the ongoing challenge of effectively fusing bone-conducted (BC) and air-conducted (AC) signals. For the first time, the noise-robust BC signal is incorporated as a conditioning cue to guide the diffusion process, enabling joint optimization with the AC speech signal. The proposed method achieves efficient integration of multimodal information and consistently outperforms both state-of-the-art multimodal approaches and unimodal diffusion baselines across various complex noise conditions. Experimental results validate the effectiveness and innovation of the proposed architecture in enhancing the robustness of speech enhancement systems.

Technology Category

Application Category

📝 Abstract

Single-channel speech enhancement models face significant performance degradation in extremely noisy environments. While prior work has shown that complementary bone-conducted speech can guide enhancement, effective integration of this noise-immune modality remains a challenge. This paper introduces a novel multimodal speech enhancement framework that integrates bone-conduction sensors with air-conducted microphones using a conditional diffusion model. Our proposed model significantly outperforms previously established multimodal techniques and a powerful diffusion-based single-modal baseline across a wide range of acoustic conditions.

Problem

Research questions and friction points this paper is trying to address.

speech enhancement

bone conduction

multimodal

noisy environments

conditional diffusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

bone-conduction

multimodal speech enhancement

conditional diffusion model