GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement

📅 2024-09-23

🏛️ IEEE Signal Processing Letters

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing speech enhancement methods suffer from high computational overhead under strong noise conditions and neglect inherent clean cues in noisy mixtures, leading to low efficiency and parameter redundancy. To address these issues, we propose the Guided Anisotropic Lightweight Diffusion Model (GA-LDM), which introduces a novel noise-aware anisotropic diffusion mechanism that explicitly preserves and leverages intrinsic clean time-frequency structures during denoising. GA-LDM integrates a lightweight neural architecture with joint time-frequency domain modeling to drastically reduce model size. With only ~4.5 million parameters, it achieves state-of-the-art performance in extremely noisy environments. Its inference speed is several times faster than mainstream diffusion-based approaches, while demonstrating superior robustness and efficiency compared to both conventional predictive models and existing diffusion-based methods.

Technology Category

Application Category

📝 Abstract

Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.

Problem

Research questions and friction points this paper is trying to address.

Speech Enhancement

Computational Efficiency

Resource Consumption

Innovation

Methods, ideas, or system contributions that make the work stand out.

GALD-SE

Directional Noise Addition

Efficient Speech Enhancement

🔎 Similar Papers

High-Resolution Speech Restoration with Latent Diffusion Model