SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Real-world speech is often simultaneously degraded by multiple factors such as noise, reverberation, and nonlinear distortions, leading to significant performance degradation in existing diffusion models under such composite conditions. To address this challenge, this work proposes a layer-wise conditional injection mechanism that embeds degradation-aware features—extracted by a pretrained multitask encoder—into the diffusion model’s timestep embedding and propagates them throughout all residual blocks. This approach enables effective integration of multidimensional degradation information without altering the underlying network architecture. Experimental results demonstrate that the proposed method substantially outperforms baseline models that either inject conditions only at the input layer or operate unconditionally, achieving superior speech enhancement performance and enhanced generalization across diverse real-world composite degradation scenarios.

Technology Category

Application Category

📝 Abstract

Real-world speech is often corrupted by multiple degradations simultaneously, including additive noise, reverberation, and nonlinear distortion. Diffusion-based enhancement methods perform well on single degradations but struggle with compound corruptions. Prior noise-aware approaches inject conditioning at the input layer only, which can degrade performance below that of an unconditioned model. To address this, we propose injecting degradation conditioning, derived from a pretrained encoder with multi-task heads for noise type, reverberation, and distortion, into the timestep embedding so that it propagates through all residual blocks without architectural changes. In controlled experiments where only the injection method varies, input-level conditioning performs worse than no encoder at all on compound degradations, while layer-wise injection achieves the best results. The method also generalizes to diverse real-world recordings.

Problem

Research questions and friction points this paper is trying to address.

speech enhancement

compound degradations

diffusion models

conditioning injection

real-world speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based speech enhancement

layer-wise conditioning injection

multi-degradation robustness