ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing noise-robust speaker verification methods rely on implicit noise suppression, making it difficult to explicitly disentangle noise from speaker-specific features—thereby limiting robustness improvements. To address this, we propose a parallel joint-learning framework that introduces, for the first time, collaboratively operating noise extraction and speech enhancement networks. Built upon a dual U-Net architecture, our approach enables end-to-end joint optimization of noise modeling and speech purification, simultaneously enhancing speech quality and explicitly preserving discriminative speaker representations. Evaluated on standard noisy benchmark datasets, our method achieves significant performance gains: the equal error rate (EER) improves by 8.4% relatively over the previous state-of-the-art. This advancement not only enhances verification accuracy under noisy conditions but also improves model interpretability through explicit noise-speaker feature separation.

Technology Category

Application Category

📝 Abstract
Noise-robust speaker verification leverages joint learning of speech enhancement (SE) and speaker verification (SV) to improve robustness. However, prevailing approaches rely on implicit noise suppression, which struggles to separate noise from speaker characteristics as they do not explicitly distinguish noise from speech during training. Although integrating SE and SV helps, it remains limited in handling noise effectively. Meanwhile, recent SE studies suggest that explicitly modeling noise, rather than merely suppressing it, enhances noise resilience. Reflecting this, we propose ParaNoise-SV, with dual U-Nets combining a noise extraction (NE) network and a speech enhancement (SE) network. The NE U-Net explicitly models noise, while the SE U-Net refines speech with guidance from NE through parallel connections, preserving speaker-relevant features. Experimental results show that ParaNoise-SV achieves a relatively 8.4% lower equal error rate (EER) than previous joint SE-SV models.
Problem

Research questions and friction points this paper is trying to address.

Explicitly modeling noise for speaker verification robustness
Separating noise from speaker characteristics in training
Improving noise resilience through parallel joint learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel dual U-Nets for joint learning
Explicit noise extraction network modeling
Guided speech enhancement preserving features
🔎 Similar Papers
No similar papers found.