🤖 AI Summary
Supervised dereverberation methods struggle in real-world scenarios due to the scarcity of paired clean-reverberant speech data. Method: This paper proposes an unsupervised learning framework requiring only reverberant speech and a prior acoustic model. Its core innovation is a Bayesian inference–based sequential learning strategy that jointly optimizes neural network estimation of both the clean signal and room acoustic parameters (e.g., RT60, dimensions) via a reverberation-matching loss. By eliminating reliance on clean-speech labels, it establishes a weakly supervised-to-unsupervised end-to-end training paradigm. Contribution/Results: Experiments demonstrate that the method significantly outperforms existing unsupervised baselines using merely 100 samples annotated with reverberation parameters. It achieves more robust, interpretable, and generalizable speech dereverberation under low-resource conditions, offering strong practical applicability.
📝 Abstract
This paper explores the outcome of training state-ofthe-art dereverberation models with supervision settings ranging from weakly-supervised to fully unsupervised, relying solely on reverberant signals and an acoustic model for training. Most of the existing deep learning approaches typically require paired dry and reverberant data, which are difficult to obtain in practice. We develop instead a sequential learning strategy motivated by a bayesian formulation of the dereverberation problem, wherein acoustic parameters and dry signals are estimated from reverberant inputs using deep neural networks, guided by a reverberation matching loss. Our most data-efficient variant requires only 100 reverberation-parameter-labelled samples to outperform an unsupervised baseline, demonstrating the effectiveness and practicality of the proposed method in low-resource scenarios.