Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to synthesize physically plausible spatial audio from novel viewpoints, primarily due to their neglect of critical acoustic factors such as global geometric structure and material semantics. This work proposes the first physics-aware framework for novel-view acoustic synthesis, which reconstructs a 3D acoustic environment from multi-view images and integrates physical semantic priors—such as material properties and scene layout—extracted via vision-language models. By jointly leveraging geometric and semantic cues, the method enables dual-driven audio generation that significantly outperforms current approaches on the RWAVS dataset, achieving notable advances in both perceptual realism and physical consistency of binaural audio.

Technology Category

Application Category

📝 Abstract
Spatial audio is essential for immersive experiences, yet novel-view acoustic synthesis (NVAS) remains challenging due to complex physical phenomena such as reflection, diffraction, and material absorption. Existing methods based on single-view or panoramic inputs improve spatial fidelity but fail to capture global geometry and semantic cues such as object layout and material properties. To address this, we propose Phys-NVAS, the first physics-aware NVAS framework that integrates spatial geometry modeling with vision-language semantic priors. A global 3D acoustic environment is reconstructed from multi-view images and depth maps to estimate room size and shape, enhancing spatial awareness of sound propagation. Meanwhile, a vision-language model extracts physics-aware priors of objects, layouts, and materials, capturing absorption and reflection beyond geometry. An acoustic feature fusion adapter unifies these cues into a physics-aware representation for binaural generation. Experiments on RWAVS demonstrate that Phys-NVAS yields binaural audio with improved realism and physical consistency.
Problem

Research questions and friction points this paper is trying to address.

novel-view acoustic synthesis
spatial audio
physics-aware modeling
3D acoustic environment
vision-language priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-aware acoustic synthesis
Novel-view audio synthesis
3D acoustic environment modeling
Vision-language priors
Binaural audio generation
🔎 Similar Papers
No similar papers found.