Few-step Adversarial Schr""{o}dinger Bridge for Generative Speech Enhancement

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing diffusion models and Schrödinger Bridge (SB) methods for speech enhancement require ≥50 sampling steps, resulting in slow inference and severe performance degradation under low SNR when using few-step sampling. This work proposes the first integration of SB theory with generative adversarial networks (GANs) to construct an end-to-end differentiable, few-step reversible generative framework: SB theory is employed to model the prior distribution, while adversarial training enhances single-step reconstruction fidelity and ensures alignment between the generated and real speech distributions. Evaluated on full-band speech enhancement, our method achieves state-of-the-art performance with only one inference step—outperforming mainstream multi-step diffusion and SB baselines. It yields significant improvements in denoising (PESQ +1.2) and dereverberation (STOI +3.8%), effectively breaking the quality bottleneck inherent in few-step sampling.

Technology Category

Application Category

📝 Abstract

Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schr""odinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and require a large number of sampling steps -- more than 50. Our investigation reveals that the performance of baseline models significantly degrades when the number of sampling steps is reduced, particularly under low-SNR conditions. We propose integrating Schr""odinger Bridge with GANs to effectively mitigate this issue, achieving high-quality outputs on full-band datasets while substantially reducing the required sampling steps. Experimental results demonstrate that our proposed model outperforms existing baselines, even with a single inference step, in both denoising and dereverberation tasks.

Problem

Research questions and friction points this paper is trying to address.

Reduce sampling steps in speech enhancement models

Improve performance under low-SNR conditions

Enhance speech quality with fewer inference steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Schrxf6dinger Bridge with GANs

Reduces sampling steps significantly

Enhances speech under low-SNR conditions

🔎 Similar Papers

Investigating Training Objectives for Generative Speech Enhancement