A Hybrid Discriminative and Generative System for Universal Speech Enhancement

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of universal speech enhancement under diverse distortions and complex recording conditions by proposing a hybrid architecture that integrates discriminative and generative modeling. The framework employs TF-GridNet to preserve signal fidelity while leveraging an autoregressive generative model to enhance fine-grained reconstruction. An adaptive fusion mechanism coordinates these components for joint optimization. Innovatively, the approach incorporates a sampling-rate-agnostic strategy and a comprehensive Speech Quality Assessment (SQA) loss function, enabling robust processing across varying input sampling rates and optimizing speech quality across multiple dimensions. The proposed method achieved third place in Track 1 of the ICASSP 2026 URGENT Challenge.

Technology Category

Application Category

📝 Abstract
Universal speech enhancement aims at handling inputs with various speech distortions and recording conditions. In this work, we propose a novel hybrid architecture that synergizes the signal fidelity of discriminative modeling with the reconstruction capabilities of generative modeling. Our system utilizes the discriminative TF-GridNet model with the Sampling-Frequency-Independent strategy to handle variable sampling rates universally. In parallel, an autoregressive model combined with spectral mapping modeling generates detail-rich speech while effectively suppressing generative artifacts. Finally, a fusion network learns adaptive weights of the two outputs under the optimization of signal-level losses and the comprehensive Speech Quality Assessment (SQA) loss. Our proposed system is evaluated in the ICASSP 2026 URGENT Challenge (Track 1) and ranks the third place.
Problem

Research questions and friction points this paper is trying to address.

universal speech enhancement
speech distortions
recording conditions
variable sampling rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid architecture
universal speech enhancement
TF-GridNet
autoregressive generative modeling
Speech Quality Assessment (SQA)
🔎 Similar Papers
No similar papers found.
Y
Yinghao Liu
Intelligent Connectivity, Alibaba Group
Chengwei Liu
Chengwei Liu
Research Assistant Professor, Nanyang Technological University
Open Source SecuritySoftware Supply Chain SecurityProgram AnalysisSoftware Maintenance
X
Xiaotao Liang
Intelligent Connectivity, Alibaba Group
H
Haoyin Yan
Intelligent Connectivity, Alibaba Group; Tongyi AI Lab, Alibaba Group
S
Shaofei Xue
Intelligent Connectivity, Alibaba Group; Tongyi AI Lab, Alibaba Group
Z
Zheng Xue
Intelligent Connectivity, Alibaba Group