Building Audio-Visual Digital Twins with Smartphones

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Existing digital twin approaches predominantly focus on visual modeling, neglecting the critical role of acoustics in spatial realism and interactive experience. This paper introduces the first editable audio-visual digital twin system built entirely on commodity smartphones, overcoming the limitations of vision-only reconstruction by jointly modeling and co-editing geometry, surface materials, and acoustic fields. Our method integrates smartphone-captured room impulse responses (RIRs), vision-guided acoustic field estimation, differentiable acoustic rendering, and neural surface material inversion. It enables real-time interactive editing of geometric layouts and material properties, with synchronized updates of high-fidelity audio-visual renderings. Experiments conducted in real-world rooms demonstrate accurate geometric-acoustic reconstruction and consistent cross-modal editing performance. Crucially, the system requires no specialized acoustic hardware, substantially lowering the barrier to audio-visual digital twin construction.

Technology Category

Application Category

📝 Abstract

Digital twins today are almost entirely visual, overlooking acoustics-a core component of spatial realism and interaction. We introduce AV-Twin, the first practical system that constructs editable audio-visual digital twins using only commodity smartphones. AV-Twin combines mobile RIR capture and a visual-assisted acoustic field model to efficiently reconstruct room acoustics. It further recovers per-surface material properties through differentiable acoustic rendering, enabling users to modify materials, geometry, and layout while automatically updating both audio and visuals. Together, these capabilities establish a practical path toward fully modifiable audio-visual digital twins for real-world environments.

Problem

Research questions and friction points this paper is trying to address.

Constructs editable audio-visual digital twins using smartphones

Reconstructs room acoustics via mobile capture and visual-assisted modeling

Recovers material properties to enable synchronized audio-visual modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses smartphones to create editable audio-visual digital twins

Combines mobile RIR capture with visual-assisted acoustic modeling

Recovers material properties via differentiable acoustic rendering

🔎 Similar Papers

No similar papers found.