Unsupervised Speech Enhancement using Data-defined Priors

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Real-world speech enhancement faces a critical challenge under unpaired data settings—large-scale authentic clean-noisy speech pairs are difficult to obtain, leading to domain mismatch when relying on synthetic data. Method: This paper proposes a dual-branch encoder-decoder framework with adversarial training, leveraging unpaired clean speech and noise recordings separately to construct data-driven prior constraints for implicit speech-noise separation. Contribution/Results: Instead of imposing handcrafted model priors, the method defines priors directly from data and, for the first time, reveals that domain alignment between clean speech data and the target task critically determines performance. Experiments demonstrate state-of-the-art results under fully unsupervised conditions. Crucially, the study exposes that prior works significantly overestimate performance by using in-domain clean speech—a finding that provides an essential caution for data selection in future research.

Technology Category

Application Category

📝 Abstract

The majority of deep learning-based speech enhancement methods require paired clean-noisy speech data. Collecting such data at scale in real-world conditions is infeasible, which has led the community to rely on synthetically generated noisy speech. However, this introduces a gap between the training and testing phases. In this work, we propose a novel dual-branch encoder-decoder architecture for unsupervised speech enhancement that separates the input into clean speech and residual noise. Adversarial training is employed to impose priors on each branch, defined by unpaired datasets of clean speech and, optionally, noise. Experimental results show that our method achieves performance comparable to leading unsupervised speech enhancement approaches. Furthermore, we demonstrate the critical impact of clean speech data selection on enhancement performance. In particular, our findings reveal that performance may appear overly optimistic when in-domain clean speech data are used for prior definition -- a practice adopted in previous unsupervised speech enhancement studies.

Problem

Research questions and friction points this paper is trying to address.

Eliminates dependency on paired clean-noisy training data

Reduces domain gap between synthetic and real-world conditions

Investigates impact of clean speech data selection on performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch encoder-decoder separates speech and noise

Adversarial training imposes priors using unpaired datasets

Clean speech data selection critically impacts enhancement performance

🔎 Similar Papers

No similar papers found.