🤖 AI Summary
To address the poor generalization of room impulse response (RIR) estimation methods in mixed reality—particularly their inability to transfer across rooms with diverse geometries and surface materials—this paper introduces xRIR, the first high-fidelity RIR prediction framework enabling cross-room generalization. Methodologically, xRIR integrates a geometry-aware panoramic depth feature extractor, an RIR encoder, and a physics-informed cross-domain feature fusion architecture to achieve end-to-end acoustic simulation-to-real transfer. Key contributions include: (1) the construction of ACOUSTICROOMS, the first large-scale, high-fidelity synthetic RIR dataset comprising over 300,000 RIRs; (2) state-of-the-art performance on 260 simulated rooms; and (3) validated sim-to-real transfer across four real-world environments, achieving industry-leading RIR fidelity and enabling immersive, real-time auditory rendering for mixed reality applications.
📝 Abstract
In mixed reality applications, a realistic acoustic experience in spatial environments is as crucial as the visual experience for achieving true immersion. Despite recent advances in neural approaches for Room Impulse Response (RIR) estimation, most existing methods are limited to the single environment on which they are trained, lacking the ability to generalize to new rooms with different geometries and surface materials. We aim to develop a unified model capable of reconstructing the spatial acoustic experience of any environment with minimum additional measurements. To this end, we present xRIR, a framework for cross-room RIR prediction. The core of our generalizable approach lies in combining a geometric feature extractor, which captures spatial context from panorama depth images, with a RIR encoder that extracts detailed acoustic features from only a few reference RIR samples. To evaluate our method, we introduce ACOUSTICROOMS, a new dataset featuring high-fidelity simulation of over 300,000 RIRs from 260 rooms. Experiments show that our method strongly outperforms a series of baselines. Furthermore, we successfully perform sim-to-real transfer by evaluating our model on four real-world environments, demonstrating the generalizability of our approach and the realism of our dataset.