🤖 AI Summary
In multi-channel speech enhancement, jointly suppressing noise, reverberation, and interfering sources poses significant challenges for accurate spatial filter parameter estimation. To address this, we propose a synergistic modeling framework that integrates model-driven and data-driven paradigms. Methodologically, we conduct the first systematic comparative analysis of three paradigms—purely model-driven, purely data-driven, and hybrid—and unify microphone array signal processing, optimal spatial filtering (MVDR/GEVD), deep neural networks, and statistical modeling within a joint optimization training framework. This ensures both physical interpretability and data adaptivity throughout parameter estimation and filtering. Experiments demonstrate substantial improvements: the proposed method achieves average SNR gains of 1.2–2.4 dB and significantly outperforms single-paradigm approaches in noise suppression, speech separation, and dereverberation, as measured by PESQ and STOI.
📝 Abstract
Multichannel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and nontarget or noise sources for signal enhancement. However, the textbook solutions for optimal data-dependent spatial filtering rest on the knowledge of second-order statistical moments of the signals, which have traditionally been difficult to acquire. In this contribution, we compare model-based, purely data-driven, and hybrid approaches to parameter estimation and filtering, where the latter tries to combine the benefits of model-based signal processing and data-driven deep learning to overcome their individual deficiencies. We illustrate the underlying design principles with examples from noise reduction, source separation, and dereverberation.