🤖 AI Summary
This study investigates the necessity of phase information in time-frequency (T-F) domain for weakly supervised speech dereverberation. Based on statistical wavefield theory, we analyze the phase characteristics of reverberant speech and find that high-frequency phases are dominated by white noise and exhibit low information entropy, rendering them ineffective for dereverberation guidance.
Method: We propose a phase-agnostic weakly supervised framework: the loss function constrains only the magnitude spectrum, omitting conventional phase-related terms; end-to-end optimization is achieved via T-F transformation coupled with phase masking.
Contribution/Results: Experiments under clean-speech-free supervision demonstrate significant improvements in objective metrics (e.g., STOI, PESQ), empirically validating the non-criticality of phase for dereverberation. Our approach provides both theoretical justification and a practical solution for simplifying model architecture, reducing computational overhead, and alleviating annotation dependency in weakly supervised dereverberation.
📝 Abstract
In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time-frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.