Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Existing methods struggle to synthesize physically plausible spatial audio from novel viewpoints, primarily due to their neglect of critical acoustic factors such as global geometric structure and material semantics. This work proposes the first physics-aware framework for novel-view acoustic synthesis, which reconstructs a 3D acoustic environment from multi-view images and integrates physical semantic priors—such as material properties and scene layout—extracted via vision-language models. By jointly leveraging geometric and semantic cues, the method enables dual-driven audio generation that significantly outperforms current approaches on the RWAVS dataset, achieving notable advances in both perceptual realism and physical consistency of binaural audio.

Technology Category

Application Category

📝 Abstract

Spatial audio is essential for immersive experiences, yet novel-view acoustic synthesis (NVAS) remains challenging due to complex physical phenomena such as reflection, diffraction, and material absorption. Existing methods based on single-view or panoramic inputs improve spatial fidelity but fail to capture global geometry and semantic cues such as object layout and material properties. To address this, we propose Phys-NVAS, the first physics-aware NVAS framework that integrates spatial geometry modeling with vision-language semantic priors. A global 3D acoustic environment is reconstructed from multi-view images and depth maps to estimate room size and shape, enhancing spatial awareness of sound propagation. Meanwhile, a vision-language model extracts physics-aware priors of objects, layouts, and materials, capturing absorption and reflection beyond geometry. An acoustic feature fusion adapter unifies these cues into a physics-aware representation for binaural generation. Experiments on RWAVS demonstrate that Phys-NVAS yields binaural audio with improved realism and physical consistency.

Problem

Research questions and friction points this paper is trying to address.

novel-view acoustic synthesis

spatial audio

physics-aware modeling

3D acoustic environment

vision-language priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-aware acoustic synthesis

Novel-view audio synthesis

3D acoustic environment modeling