Neural Vocoders as Speech Enhancers

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the intrinsic unification between speech enhancement and neural vocoding, revealing their shared low-rank degradation property. To exploit this insight, we propose the first joint modeling framework that enables a single deep neural network to perform both speech denoising/enhancement and high-fidelity waveform synthesis. Methodologically, our approach leverages low-rank feature representation, multi-objective loss design, and end-to-end joint optimization—thereby uncovering and harnessing the consistent low-rank behavior of both tasks in the spectral rank space. Experiments demonstrate that the unified model achieves performance on par with dedicated task-specific models across standard metrics (PESQ, STOI, STFT-MSE), validating the hypothesis that speech restoration tasks admit a unified formulation. This work establishes a novel paradigm for speech processing by bridging traditionally disjoint subfields through a principled low-rank perspective.

Technology Category

Application Category

📝 Abstract
Speech enhancement (SE) and neural vocoding are traditionally viewed as separate tasks. In this work, we observe them under a common thread: the rank behavior of these processes. This observation prompts two key questions: extit{Can a model designed for one task's rank degradation be adapted for the other?} and extit{Is it possible to address both tasks using a unified model?} Our empirical findings demonstrate that existing speech enhancement models can be successfully trained to perform vocoding tasks, and a single model, when jointly trained, can effectively handle both tasks with performance comparable to separately trained models. These results suggest that speech enhancement and neural vocoding can be unified under a broader framework of speech restoration. Code: https://github.com/Andong-Li-speech/Neural-Vocoders-as-Speech-Enhancers.
Problem

Research questions and friction points this paper is trying to address.

Neural vocoders
Speech enhancement
Unified model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural vocoder
Speech enhancement
Integrated model
🔎 Similar Papers
No similar papers found.
A
Andong Li
Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Z
Zhihang Sun
Tencent AI Lab, Beijing, China
F
Fengyuan Hao
Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
X
Xiaodong Li
Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Chengshi Zheng
Chengshi Zheng
Institute of Acoustics, Chinese Academy of Sciences
Speech enhancementmicrophone arraydeep learning