AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

📅 2024-06-13
🏛️ Neural Information Processing Systems
📈 Citations: 1
Influential: 1
📄 PDF

career value

216K/year
🤖 AI Summary
Existing novel-view acoustic synthesis (NVAS) methods suffer from significant limitations in scene geometry and material modeling, spatial relationship representation between sources and listeners, and inference efficiency. This paper proposes Audio-Visual Gaussian Splatting (AVGS), the first method to enable explicit, joint geometry-material Gaussian point cloud modeling, conditioned on source-listener relative pose for binaural audio rendering. We introduce novel audio-guided point initialization, sound-propagation-aware point cloud densification, and pruning strategies to enhance acoustic fidelity. On the RWAS and SoundSpaces benchmarks, AVGS substantially outperforms NeRF-based approaches, yielding more realistic and natural synthesized audio while achieving over 5× speedup in inference time.

Technology Category

Application Category

📝 Abstract
Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source. To make the visual scene model audio adaptive, we propose a point densification and pruning strategy to optimally distribute the Gaussian points, with the per-point contribution in sound propagation (e.g., more points needed for texture-less wall surfaces as they affect sound path diversion). Extensive experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
Problem

Research questions and friction points this paper is trying to address.

Novel view acoustic synthesis for binaural audio rendering.
Limitations in characterizing scene geometry and material properties.
Inefficiency and lack of audio adaptability in existing methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio-Visual Gaussian Splatting for NVAS
Material-aware and geometry-aware scene representation
Point densification and pruning for sound propagation
🔎 Similar Papers