Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering [Special Issue On Model-Based and Data-Driven Audio Signal Processing]

📅 2024-11-01
🏛️ IEEE Signal Processing Magazine
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-channel speech enhancement, jointly suppressing noise, reverberation, and interfering sources poses significant challenges for accurate spatial filter parameter estimation. To address this, we propose a synergistic modeling framework that integrates model-driven and data-driven paradigms. Methodologically, we conduct the first systematic comparative analysis of three paradigms—purely model-driven, purely data-driven, and hybrid—and unify microphone array signal processing, optimal spatial filtering (MVDR/GEVD), deep neural networks, and statistical modeling within a joint optimization training framework. This ensures both physical interpretability and data adaptivity throughout parameter estimation and filtering. Experiments demonstrate substantial improvements: the proposed method achieves average SNR gains of 1.2–2.4 dB and significantly outperforms single-paradigm approaches in noise suppression, speech separation, and dereverberation, as measured by PESQ and STOI.

Technology Category

Application Category

📝 Abstract
Multichannel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and nontarget or noise sources for signal enhancement. However, the textbook solutions for optimal data-dependent spatial filtering rest on the knowledge of second-order statistical moments of the signals, which have traditionally been difficult to acquire. In this contribution, we compare model-based, purely data-driven, and hybrid approaches to parameter estimation and filtering, where the latter tries to combine the benefits of model-based signal processing and data-driven deep learning to overcome their individual deficiencies. We illustrate the underlying design principles with examples from noise reduction, source separation, and dereverberation.
Problem

Research questions and friction points this paper is trying to address.

Multichannel Audio Signal Processing
Noise Reduction
Source Separation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multichannel Sound Signal Processing
Noise Reduction and Echo Cancellation
Theoretical Model vs Data-driven Learning
🔎 Similar Papers
No similar papers found.