Building Audio-Visual Digital Twins with Smartphones

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing digital twin approaches predominantly focus on visual modeling, neglecting the critical role of acoustics in spatial realism and interactive experience. This paper introduces the first editable audio-visual digital twin system built entirely on commodity smartphones, overcoming the limitations of vision-only reconstruction by jointly modeling and co-editing geometry, surface materials, and acoustic fields. Our method integrates smartphone-captured room impulse responses (RIRs), vision-guided acoustic field estimation, differentiable acoustic rendering, and neural surface material inversion. It enables real-time interactive editing of geometric layouts and material properties, with synchronized updates of high-fidelity audio-visual renderings. Experiments conducted in real-world rooms demonstrate accurate geometric-acoustic reconstruction and consistent cross-modal editing performance. Crucially, the system requires no specialized acoustic hardware, substantially lowering the barrier to audio-visual digital twin construction.

Technology Category

Application Category

📝 Abstract
Digital twins today are almost entirely visual, overlooking acoustics-a core component of spatial realism and interaction. We introduce AV-Twin, the first practical system that constructs editable audio-visual digital twins using only commodity smartphones. AV-Twin combines mobile RIR capture and a visual-assisted acoustic field model to efficiently reconstruct room acoustics. It further recovers per-surface material properties through differentiable acoustic rendering, enabling users to modify materials, geometry, and layout while automatically updating both audio and visuals. Together, these capabilities establish a practical path toward fully modifiable audio-visual digital twins for real-world environments.
Problem

Research questions and friction points this paper is trying to address.

Constructs editable audio-visual digital twins using smartphones
Reconstructs room acoustics via mobile capture and visual-assisted modeling
Recovers material properties to enable synchronized audio-visual modifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses smartphones to create editable audio-visual digital twins
Combines mobile RIR capture with visual-assisted acoustic modeling
Recovers material properties via differentiable acoustic rendering
🔎 Similar Papers
No similar papers found.
Z
Zitong Lan
University of Pennsylvania
Y
Yiwei Tang
University of Pennsylvania
Y
Yuhan Wang
University of Pennsylvania
H
Haowen Lai
University of Pennsylvania
Y
Yido Hao
University of Pennsylvania
Mingmin Zhao
Mingmin Zhao
Assistant Professor, University of Pennsylvania
Wireless SensingMachine LearningDigital Health