Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

In guided source separation (GSS)-based far-field speech enhancement, reference microphone selection critically affects both signal quality and automatic speech recognition (ASR) performance; conventional SNR-based criteria neglect the early-to-late reverberation ratio (ELR), leading to insufficient reverberation suppression. To address this, we propose a joint SNR and normalized ℓ<sub>p</sub> norm criterion for reference microphone selection—the first integration of the ℓ<sub>p</sub> norm into GSS systems—to jointly quantify time-frequency sparsity and reverberation characteristics of microphone signals, enabling coordinated suppression of noise and reverberation. Embedded within the CHiME-8 end-to-end ASR frontend, our method achieves significant reductions in macro-averaged word error rate (WER) under realistic far-field conditions, demonstrating robustness in complex noise-reverberation environments. This work establishes a new perceptually informed paradigm for reference microphone selection in GSS systems.

Technology Category

Application Category

📝 Abstract

Guided Source Separation (GSS) is a popular front-end for distant automatic speech recognition (ASR) systems using spatially distributed microphones. When considering spatially distributed microphones, the choice of reference microphone may have a large influence on the quality of the output signal and the downstream ASR performance. In GSS-based speech enhancement, reference microphone selection is typically performed using the signal-to-noise ratio (SNR), which is optimal for noise reduction but may neglect differences in early-to-late-reverberant ratio (ELR) across microphones. In this paper, we propose two reference microphone selection methods for GSS-based speech enhancement that are based on the normalized $ell_p$-norm, either using only the normalized $ell_p$-norm or combining the normalized $ell_p$-norm and the SNR to account for both differences in SNR and ELR across microphones. Experimental evaluation using a CHiME-8 distant ASR system shows that the proposed $ell_p$-norm-based methods outperform the baseline method, reducing the macro-average word error rate.

Problem

Research questions and friction points this paper is trying to address.

Optimizing reference microphone selection for Guided Source Separation systems

Addressing limitations of SNR-based selection by incorporating reverberation characteristics

Improving distant speech recognition performance through enhanced microphone selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes normalized L-p norm for reference selection

Combines L-p norm with SNR for microphone evaluation

Enhances speech recognition by optimizing microphone choice

🔎 Similar Papers

No similar papers found.