GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the longstanding challenge in general-purpose speech enhancement of simultaneously achieving robustness and high perceptual quality. To this end, we propose a generative-predictive fusion framework that performs full-stack speech restoration in the self-supervised representation domain and enhancement in the spectrogram domain, followed by a bandwidth extension post-processing module that fuses outputs from both branches to upsample the signal to 48 kHz. Our approach is the first to jointly integrate generative and predictive enhancement pathways, leveraging neural vocoders, self-supervised representation learning, and bandwidth extension techniques. Evaluated in the ICASSP 2026 URGENT Challenge Track 1 blind test, the proposed method achieves state-of-the-art performance in both objective and subjective metrics, significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speech restoration in a self-supervised representation domain and reconstructs the waveform via a neural vocoder, along with a predictive branch that performs spectrogram-domain enhancement, providing complementary cues. Outputs from both branches are fused by a post-processing module, which also performs bandwidth extension to generate the enhanced waveform at 48 kHz, later downsampled to the original sampling rate. This generative-predictive fusion improves robustness and perceptual quality, achieving top performance in the blind-test phase and ranking 1st in the objective evaluation. Audio examples are available at https://xiaobin-rong.github.io/gap-urgenet_demo.

Problem

Research questions and friction points this paper is trying to address.

speech enhancement

universal speech restoration

generative-predictive fusion

bandwidth extension

robust speech processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative-predictive fusion

self-supervised representation

neural vocoder