A Hybrid Model for Weakly-Supervised Speech Dereverberation

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing dereverberation methods heavily rely on scarce clean–reverberant speech pairs or exhibit poor generalization due to target metrics that lack cross-metric consistency, leading to degraded performance across evaluation criteria. This paper proposes a weakly supervised training paradigm requiring only reverberant speech and coarse RT60 estimates—no paired data needed. Our core contribution is a physics-informed, generative reverberation-matching loss based on learnable room impulse responses (RIRs): RT60 guides RIR synthesis, while waveform-level reconstruction error is jointly optimized. By embedding acoustic prior knowledge into neural modeling, the method significantly improves model robustness and generalization. Extensive experiments demonstrate consistent and stable superiority over state-of-the-art approaches across multiple objective metrics—including PESQ, STOI, and ESTOI—and crucially maintain substantial gains even under non-target evaluation metrics, confirming broad applicability and reliability.

Technology Category

Application Category

📝 Abstract

This paper introduces a new training strategy to improve speech dereverberation systems using minimal acoustic information and reverberant (wet) speech. Most existing algorithms rely on paired dry/wet data, which is difficult to obtain, or on target metrics that may not adequately capture reverberation characteristics and can lead to poor results on non-target metrics. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. The system's output is resynthesized using a generated room impulse response and compared with the original reverberant speech, providing a novel reverberation matching loss replacing the standard target metrics. During inference, only the trained dereverberation model is used. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics used in speech dereverberation than the state-of-the-art.

Problem

Research questions and friction points this paper is trying to address.

Improves speech dereverberation with minimal data

Uses reverberation time for training efficiency

Replaces standard metrics with novel loss function

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimal acoustic information training

Reverberation matching loss introduced

Improved dereverberation model consistency

🔎 Similar Papers

No similar papers found.