Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient spatiotemporal feature modeling in multi-channel speech enhancement under dynamic acoustic conditions, this paper proposes MCMamba—a novel architecture that integrates an enhanced Mamba state-space model into a multi-channel framework for joint adaptive modeling of full-band/narrowband spectral features and spatial information (e.g., spatial covariance). The method synergistically combines multi-microphone array signal processing, subband spectral decomposition, and spatial-domain feature optimization, overcoming the limitations of conventional CNNs and RNNs in capturing long-range spectral dependencies. Evaluated on the CHiME-3 dataset, MCMamba achieves state-of-the-art performance: +1.27 dB in PESQ and +3.8 percentage points in STOI—significantly outperforming baselines such as McNet. These results validate Mamba’s superiority and generalizability in modeling speech spectral temporal dynamics within multi-channel enhancement systems.

Technology Category

Application Category

📝 Abstract
In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and sub-band spectral and spatial features. However, these approaches face limitations in fully modeling complex temporal dependencies, especially in dynamic acoustic environments. To overcome these challenges, we modify the current advanced model McNet by introducing an improved version of Mamba, a state-space model, and further propose MCMamba. MCMamba has been completely reengineered to integrate full-band and narrow-band spatial information with sub-band and full-band spectral features, providing a more comprehensive approach to modeling spatial and spectral information. Our experimental results demonstrate that MCMamba significantly improves the modeling of spatial and spectral features in multichannel speech enhancement, outperforming McNet and achieving state-of-the-art performance on the CHiME-3 dataset. Additionally, we find that Mamba performs exceptionally well in modeling spectral information.
Problem

Research questions and friction points this paper is trying to address.

Multichannel speech enhancement
Complex acoustic environments
Background noise reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

MCMamba
spatial and acoustic features
multichannel speech enhancement
🔎 Similar Papers
No similar papers found.