DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In speech enhancement, predictive models often suffer from over-suppression and distortion due to deterministic mean estimation. To address this, we propose a full-band collaborative enhancement framework tailored for real-time streaming: it integrates a lightweight predictive network with a conditional generative adversarial network (cGAN) to establish a stochastic regeneration mechanism, enabling distribution-level modeling and circumventing bias inherent in point-wise mean estimation; additionally, noise-conditioned modeling is introduced to enhance robustness. The resulting model comprises only 3.58M parameters and supports low-latency streaming inference. Evaluated on NISQA-MOS, it significantly outperforms single-stage baselines, empirically validating the efficacy of distributional modeling in mitigating over-suppression. This framework has been successfully deployed in the 2025 Urgent Challenge and further refined in production.

Technology Category

Application Category

📝 Abstract
In this work, we propose a full-band real-time speech enhancement system with GAN-based stochastic regeneration. Predictive models focus on estimating the mean of the target distribution, whereas generative models aim to learn the full distribution. This behavior of predictive models may lead to over-suppression, i.e. the removal of speech content. In the literature, it was shown that combining a predictive model with a generative one within the stochastic regeneration framework can reduce the distortion in the output. We use this framework to obtain a real-time speech enhancement system. With 3.58M parameters and a low latency, our system is designed for real-time streaming with a lightweight architecture. Experiments show that our system improves over the first stage in terms of NISQA-MOS metric. Finally, through an ablation study, we show the importance of noisy conditioning in our system. We participated in 2025 Urgent Challenge with our model and later made further improvements.
Problem

Research questions and friction points this paper is trying to address.

Develop real-time full-band speech enhancement system
Reduce distortion using GAN-based stochastic regeneration
Achieve lightweight low-latency streaming architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

GAN-based stochastic regeneration for speech enhancement
Lightweight real-time streaming architecture
Noisy conditioning improves system performance
🔎 Similar Papers
No similar papers found.
S
Sanberk Serbest
Electrical and Electronics Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
T
Tijana Stojkovic
Audio Machine Learning, Logitech, Switzerland
Milos Cernak
Milos Cernak
Logitech, EPFL - Quartier de l'Innovation
Meeting SpeechSpeech Analysis-Synthesis and CodingPathological Speech ProcessingArtificial Intelligence
A
Andrew Harper
Audio Machine Learning, Logitech, Switzerland