GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement

📅 2024-09-23
🏛️ IEEE Signal Processing Letters
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing speech enhancement methods suffer from high computational overhead under strong noise conditions and neglect inherent clean cues in noisy mixtures, leading to low efficiency and parameter redundancy. To address these issues, we propose the Guided Anisotropic Lightweight Diffusion Model (GA-LDM), which introduces a novel noise-aware anisotropic diffusion mechanism that explicitly preserves and leverages intrinsic clean time-frequency structures during denoising. GA-LDM integrates a lightweight neural architecture with joint time-frequency domain modeling to drastically reduce model size. With only ~4.5 million parameters, it achieves state-of-the-art performance in extremely noisy environments. Its inference speed is several times faster than mainstream diffusion-based approaches, while demonstrating superior robustness and efficiency compared to both conventional predictive models and existing diffusion-based methods.

Technology Category

Application Category

📝 Abstract
Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.
Problem

Research questions and friction points this paper is trying to address.

Speech Enhancement
Computational Efficiency
Resource Consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

GALD-SE
Directional Noise Addition
Efficient Speech Enhancement
🔎 Similar Papers
No similar papers found.
C
Chengzhong Wang
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing 100190, China
J
Jianjun Gu
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing 100190, China
Dingding Yao
Dingding Yao
Institute of Acoustics, Chinese Academy of Sciences
Spatial HearingBinaural TechnologyAuditory ProcessingHRTF
J
Junfeng Li
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing 100190, China
Yonghong Yan
Yonghong Yan
University of North Carolina at Charlotte
Parallel and High Performance ComputingParallel Programming Languages and CompilersComputer Architecture and SystemsDistri