🤖 AI Summary
This work addresses the challenge of efficiently reconstructing room impulse responses (RIRs) in densely sampled acoustic spaces, where direct measurement is often impractical. The authors propose RIR-Former, a mesh-free, single-step feedforward model based on a coordinate-guided Transformer architecture that enables continuous RIR reconstruction at arbitrary microphone positions. By integrating sinusoidal positional encoding with spatial coordinates and employing a segmented multi-branch decoder to separately model early reflections and late reverberation, the method supports high-fidelity, full-time-domain interpolation under arbitrary microphone array geometries. Evaluated across diverse simulated acoustic environments, RIR-Former consistently outperforms state-of-the-art approaches in terms of normalized mean squared error (NMSE) and cosine similarity, demonstrating strong robustness to varying missing-data rates and array configurations.
📝 Abstract
Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a grid-free, one-step feed-forward model for RIR reconstruction. By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information, enabling interpolation at arbitrary array locations. Furthermore, a segmented multi-branch decoder is designed to separately handle early reflections and late reverberation, improving reconstruction across the entire RIR. Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines in terms of normalized mean square error (NMSE) and cosine distance (CD), under varying missing rates and array configurations. These results highlight the potential of our approach for practical deployment and motivate future work on scaling from randomly spaced linear arrays to complex array geometries, dynamic acoustic scenes, and real-world environments.