GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding challenge in general-purpose speech enhancement of simultaneously achieving robustness and high perceptual quality. To this end, we propose a generative-predictive fusion framework that performs full-stack speech restoration in the self-supervised representation domain and enhancement in the spectrogram domain, followed by a bandwidth extension post-processing module that fuses outputs from both branches to upsample the signal to 48 kHz. Our approach is the first to jointly integrate generative and predictive enhancement pathways, leveraging neural vocoders, self-supervised representation learning, and bandwidth extension techniques. Evaluated in the ICASSP 2026 URGENT Challenge Track 1 blind test, the proposed method achieves state-of-the-art performance in both objective and subjective metrics, significantly outperforming existing approaches.
📝 Abstract
We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speech restoration in a self-supervised representation domain and reconstructs the waveform via a neural vocoder, along with a predictive branch that performs spectrogram-domain enhancement, providing complementary cues. Outputs from both branches are fused by a post-processing module, which also performs bandwidth extension to generate the enhanced waveform at 48 kHz, later downsampled to the original sampling rate. This generative-predictive fusion improves robustness and perceptual quality, achieving top performance in the blind-test phase and ranking 1st in the objective evaluation. Audio examples are available at https://xiaobin-rong.github.io/gap-urgenet_demo.
Problem

Research questions and friction points this paper is trying to address.

speech enhancement
universal speech restoration
generative-predictive fusion
bandwidth extension
robust speech processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative-predictive fusion
self-supervised representation
neural vocoder
spectrogram-domain enhancement
bandwidth extension
🔎 Similar Papers
No similar papers found.
X
Xiaobin Rong
Key Laboratory of Modern Acoustics, Nanjing University; NJU-Horizon Intelligent Audio Lab, Horizon Robotics
Yushi Wang
Yushi Wang
Tsinghua University
Robotics
Z
Zheng Wang
Key Laboratory of Modern Acoustics, Nanjing University; NJU-Horizon Intelligent Audio Lab, Horizon Robotics
Jing Lu
Jing Lu
University of California, Santa Barbara
ElectronicsMOCVD material growth