🤖 AI Summary
This work addresses the challenging problem of directional and distance-aware speech separation in complex acoustic environments. We propose a novel microphone array-based speech separation framework that jointly exploits directional and distance cues. Specifically, we design an enhanced delay-and-sum beamformer to extract directional features and, for the first time, explicitly incorporate the direct-to-reverberant ratio (DRR) as a distance-sensitive feature, which is jointly fed into a neural network for end-to-end modeling. This approach enables synergistic integration of directional and distance information, overcoming the limitation of conventional methods relying solely on angular estimates. Evaluated on the realistic multi-channel CHiME-8 MMCSG dataset, our method achieves significant improvements over state-of-the-art approaches in key metrics including STOI and SI-SNR, demonstrating superior robustness and practicality in scenarios with concurrent reverberation and interference.
📝 Abstract
In this paper, we introduce a neural network-based method for regional speech separation using a microphone array. This approach leverages novel spatial cues to extract the sound source not only from specified direction but also within defined distance. Specifically, our method employs an improved delay-and-sum technique to obtain directional cues, substantially enhancing the signal from the target direction. We further enhance separation by incorporating the direct-to-reverberant ratio into the input features, enabling the model to better discriminate sources within and beyond a specified distance. Experimental results demonstrate that our proposed method leads to substantial gains across multiple objective metrics. Furthermore, our method achieves state-of-the-art performance on the CHiME-8 MMCSG dataset, which was recorded in real-world conversational scenarios, underscoring its effectiveness for speech separation in practical applications.