Modèle physique variationnel pour l'estimation de réponses impulsionnelles de salles

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Most existing speech dereverberation methods rely either on purely statistical modeling or deep learning, lacking interpretable statistical–physical joint modeling. This paper addresses this gap by proposing an interpretable parametric decomposition of room impulse responses (RIRs): it couples a frequency-dependent exponential decay model with autoregressive filtering to explicitly capture the physical reverberation decay characteristics and residual structural correlations. Parameter estimation is performed robustly via variational free energy minimization, integrating white Gaussian noise modeling with variational inference. Experiments demonstrate that the method significantly outperforms conventional deconvolution-based approaches in objective metrics—both for clean and reverberant speech inputs—and exhibits superior robustness under noisy conditions. By unifying physically grounded signal modeling with principled statistical inference, the approach establishes a new paradigm for speech enhancement preprocessing that balances interpretability, physical fidelity, and statistical efficiency.

Technology Category

Application Category

📝 Abstract
Room impulse response estimation is essential for tasks like speech dereverberation, which improves automatic speech recognition. Most existing methods rely on either statistical signal processing or deep neural networks designed to replicate signal processing principles. However, combining statistical and physical modeling for RIR estimation remains largely unexplored. This paper proposes a novel approach integrating both aspects through a theoretically grounded model. The RIR is decomposed into interpretable parameters: white Gaussian noise filtered by a frequency-dependent exponential decay (e.g. modeling wall absorption) and an autoregressive filter (e.g. modeling microphone response). A variational free-energy cost function enables practical parameter estimation. As a proof of concept, we show that given dry and reverberant speech signals, the proposed method outperforms classical deconvolution in noisy environments, as validated by objective metrics.
Problem

Research questions and friction points this paper is trying to address.

Estimating room impulse responses for speech dereverberation
Combining statistical and physical modeling approaches
Improving performance in noisy environments over classical methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines statistical and physical modeling
Decomposes RIR into interpretable parameters
Uses variational free-energy cost function
🔎 Similar Papers
No similar papers found.