TS-URGENet: A Three-stage Universal Robust and Generalizable Speech Enhancement Network

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the general speech enhancement problem under diverse distortions—including packet loss, noise, reverberation, clipping, and codec artifacts—as well as heterogeneous input formats. We propose a three-stage cascaded architecture: *Padding*, which reconstructs missing frames; *Separation*, which jointly suppresses multiple interference types; and *Restoration*, which compensates for bandwidth limitations and codec-induced distortions. The method employs joint time-frequency modeling, integrating masking estimation, spectral interpolation, and adaptive residual reconstruction. To our knowledge, this is the first framework enabling hierarchical, cooperative modeling and correction of heterogeneous distortions. Evaluated on the Interspeech 2025 URGENT Challenge Track 1, our approach ranks second, achieving significant improvements in PESQ (+1.12), STOI (+0.08), and DNSMOS (+0.34) over baselines. It demonstrates exceptional generalization across unseen distortion combinations and robustness to varying input formats and degradation severities.

Technology Category

Application Category

📝 Abstract
Universal speech enhancement aims to handle input speech with different distortions and input formats. To tackle this challenge, we present TS-URGENet, a Three-Stage Universal, Robust, and Generalizable speech Enhancement Network. To address various distortions, the proposed system employs a novel three-stage architecture consisting of a filling stage, a separation stage, and a restoration stage. The filling stage mitigates packet loss by preliminarily filling lost regions under noise interference, ensuring signal continuity. The separation stage suppresses noise, reverberation, and clipping distortion to improve speech clarity. Finally, the restoration stage compensates for bandwidth limitation, codec artifacts, and residual packet loss distortion, refining the overall speech quality. Our proposed TS-URGENet achieved outstanding performance in the Interspeech 2025 URGENT Challenge, ranking 2nd in Track 1.
Problem

Research questions and friction points this paper is trying to address.

Universal speech enhancement for diverse distortions and formats
Three-stage architecture addressing packet loss, noise, and bandwidth issues
Improving speech clarity and quality under multiple distortions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage architecture for speech enhancement
Filling stage mitigates packet loss and noise
Separation and restoration stages improve clarity
🔎 Similar Papers
No similar papers found.
X
Xiaobin Rong
Key Laboratory of Modern Acoustics, Nanjing University, China; NJU-Horizon Intelligent Audio Lab, Horizon Robotics, China
D
Dahan Wang
Key Laboratory of Modern Acoustics, Nanjing University, China; NJU-Horizon Intelligent Audio Lab, Horizon Robotics, China
Q
Qinwen Hu
Key Laboratory of Modern Acoustics, Nanjing University, China; NJU-Horizon Intelligent Audio Lab, Horizon Robotics, China
Yushi Wang
Yushi Wang
Tsinghua University
Robotics
Yuxiang Hu
Yuxiang Hu
NJU-Horizon Intelligent Audio Lab, Horizon Robotics, China
Jing Lu
Jing Lu
University of California, Santa Barbara
ElectronicsMOCVD material growth