A Hybrid Discriminative and Generative System for Universal Speech Enhancement

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of universal speech enhancement under diverse distortions and complex recording conditions by proposing a hybrid architecture that integrates discriminative and generative modeling. The framework employs TF-GridNet to preserve signal fidelity while leveraging an autoregressive generative model to enhance fine-grained reconstruction. An adaptive fusion mechanism coordinates these components for joint optimization. Innovatively, the approach incorporates a sampling-rate-agnostic strategy and a comprehensive Speech Quality Assessment (SQA) loss function, enabling robust processing across varying input sampling rates and optimizing speech quality across multiple dimensions. The proposed method achieved third place in Track 1 of the ICASSP 2026 URGENT Challenge.

Technology Category

Application Category

📝 Abstract

Universal speech enhancement aims at handling inputs with various speech distortions and recording conditions. In this work, we propose a novel hybrid architecture that synergizes the signal fidelity of discriminative modeling with the reconstruction capabilities of generative modeling. Our system utilizes the discriminative TF-GridNet model with the Sampling-Frequency-Independent strategy to handle variable sampling rates universally. In parallel, an autoregressive model combined with spectral mapping modeling generates detail-rich speech while effectively suppressing generative artifacts. Finally, a fusion network learns adaptive weights of the two outputs under the optimization of signal-level losses and the comprehensive Speech Quality Assessment (SQA) loss. Our proposed system is evaluated in the ICASSP 2026 URGENT Challenge (Track 1) and ranks the third place.

Problem

Research questions and friction points this paper is trying to address.

universal speech enhancement

speech distortions

recording conditions

variable sampling rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid architecture

universal speech enhancement

TF-GridNet

autoregressive generative modeling