Hearing Anywhere in Any Environment

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization of room impulse response (RIR) estimation methods in mixed reality—particularly their inability to transfer across rooms with diverse geometries and surface materials—this paper introduces xRIR, the first high-fidelity RIR prediction framework enabling cross-room generalization. Methodologically, xRIR integrates a geometry-aware panoramic depth feature extractor, an RIR encoder, and a physics-informed cross-domain feature fusion architecture to achieve end-to-end acoustic simulation-to-real transfer. Key contributions include: (1) the construction of ACOUSTICROOMS, the first large-scale, high-fidelity synthetic RIR dataset comprising over 300,000 RIRs; (2) state-of-the-art performance on 260 simulated rooms; and (3) validated sim-to-real transfer across four real-world environments, achieving industry-leading RIR fidelity and enabling immersive, real-time auditory rendering for mixed reality applications.

Technology Category

Application Category

📝 Abstract
In mixed reality applications, a realistic acoustic experience in spatial environments is as crucial as the visual experience for achieving true immersion. Despite recent advances in neural approaches for Room Impulse Response (RIR) estimation, most existing methods are limited to the single environment on which they are trained, lacking the ability to generalize to new rooms with different geometries and surface materials. We aim to develop a unified model capable of reconstructing the spatial acoustic experience of any environment with minimum additional measurements. To this end, we present xRIR, a framework for cross-room RIR prediction. The core of our generalizable approach lies in combining a geometric feature extractor, which captures spatial context from panorama depth images, with a RIR encoder that extracts detailed acoustic features from only a few reference RIR samples. To evaluate our method, we introduce ACOUSTICROOMS, a new dataset featuring high-fidelity simulation of over 300,000 RIRs from 260 rooms. Experiments show that our method strongly outperforms a series of baselines. Furthermore, we successfully perform sim-to-real transfer by evaluating our model on four real-world environments, demonstrating the generalizability of our approach and the realism of our dataset.
Problem

Research questions and friction points this paper is trying to address.

Generalize Room Impulse Response prediction across diverse environments
Reconstruct spatial acoustics with minimal additional measurements
Overcome limitations of single-environment trained neural approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines geometric and acoustic feature extractors
Uses few reference RIR samples for prediction
Generalizes to new rooms with different geometries
🔎 Similar Papers
No similar papers found.