BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

📅 2024-05-07

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

271K/year

🤖 AI Summary

Existing blind estimation methods suffer from poor generalization across diverse acoustic environments under realistic noise conditions and typically estimate only a limited subset of room acoustic parameters (RAPs) or room geometric parameters (RGPs), failing to jointly infer critical physical quantities—including reverberation time (RT60), source distance/azimuth, and room occupancy ratio. To address this, we propose a Sparse Stochastic Impulse Response (SSIR) model tailored for single-channel noisy speech and design a unified encoder–multi-branch prediction framework. Our approach enables the first end-to-end joint blind estimation of both RAPs and room physical parameters (RPPs), requiring neither clean speech nor prior knowledge. By integrating sparse representation with joint optimization, we significantly enhance the fidelity of room impulse response (RIR) modeling. Evaluated on a newly constructed benchmark dataset, our method achieves state-of-the-art performance. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Room acoustic parameters (RAPs) and room physical parameters (RPPs) are essential metrics for parameterizing the room acoustical characteristics (RACs) of a sound field around a listener's local environment, offering comprehensive indications for various applications. Current RAP and RPP estimation methods either fall short of covering broad real-world acoustic environments in the context of real background noise or lack universal frameworks for blindly estimating RAPs and RPPs from noisy single-channel speech signals, particularly sound source distances, direction of arrival (DOA) of sound sources, and occupancy levels. On the other hand, in this paper, we propose a new universal blind estimation framework called the blind estimator of the room acoustical and physical parameters (BERP), by introducing a new stochastic room impulse response (RIR) model, namely the sparse stochastic impulse response (SSIR) model, and endowing the BERP with a unified encoder and multiple separate predictors to estimate the RPPs and the parameters SSIR in parallel. This estimation framework enables computationally efficient and universal estimation of room parameters using only noisy single-channel speech signals. Finally, all RAPs can be simultaneously derived from RIRs synthesized from the SSIR model with estimated parameters. To evaluate the effectiveness of the proposed BERP and SSIR models, we compile a task-specific dataset from several publicly available datasets. The results reveal that the BERP achieves state-of-the-art (SOTA) performance. In addition, the evaluation results for the SSIR RIR model also demonstrated its efficacy. The code is available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Estimates room acoustical and geometrical parameters from noisy speech.

Proposes a universal framework for blind estimation of room parameters.

Improves estimation accuracy and adaptability in real-world scenarios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified room feature encoder with attention mechanisms

Separate parametric predictors for parallel parameter estimation

Combines convolutional layers for local and global acoustic features

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Machine Learning Engineer, Siri Attention & Invocation

Apple

Cupertino, United States of America

Authors to Follow