SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world speech is often simultaneously degraded by multiple factors such as noise, reverberation, and nonlinear distortions, leading to significant performance degradation in existing diffusion models under such composite conditions. To address this challenge, this work proposes a layer-wise conditional injection mechanism that embeds degradation-aware features—extracted by a pretrained multitask encoder—into the diffusion model’s timestep embedding and propagates them throughout all residual blocks. This approach enables effective integration of multidimensional degradation information without altering the underlying network architecture. Experimental results demonstrate that the proposed method substantially outperforms baseline models that either inject conditions only at the input layer or operate unconditionally, achieving superior speech enhancement performance and enhanced generalization across diverse real-world composite degradation scenarios.

Technology Category

Application Category

📝 Abstract
Real-world speech is often corrupted by multiple degradations simultaneously, including additive noise, reverberation, and nonlinear distortion. Diffusion-based enhancement methods perform well on single degradations but struggle with compound corruptions. Prior noise-aware approaches inject conditioning at the input layer only, which can degrade performance below that of an unconditioned model. To address this, we propose injecting degradation conditioning, derived from a pretrained encoder with multi-task heads for noise type, reverberation, and distortion, into the timestep embedding so that it propagates through all residual blocks without architectural changes. In controlled experiments where only the injection method varies, input-level conditioning performs worse than no encoder at all on compound degradations, while layer-wise injection achieves the best results. The method also generalizes to diverse real-world recordings.
Problem

Research questions and friction points this paper is trying to address.

speech enhancement
compound degradations
diffusion models
conditioning injection
real-world speech
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based speech enhancement
layer-wise conditioning injection
multi-degradation robustness
timestep embedding conditioning
pretrained encoder with multi-task heads
🔎 Similar Papers
No similar papers found.