🤖 AI Summary
In augmented reality (AR), interactions between virtual and real-world objects often lack physically plausible audio feedback—specifically, sound effects that align with material properties and collision dynamics—leading to multisensory dissonance and reduced perceptual realism. To address this, we propose the first context-aware sound generation framework tailored for AR: it jointly estimates real-world object materials via real-time computer vision, models collision dynamics, and synthesizes parameterized acoustic signals to produce interaction-consistent, personalized sounds. Our end-to-end system integrates material segmentation, physics-based dynamics estimation, and low-latency audio rendering, enabling real-time auditory feedback in dynamic environments. A user study demonstrates that our method significantly improves perceived sound realism over generic sound effects (p < 0.01) and enhances users’ accuracy and confidence in distinguishing visually similar materials by 32% and 28%, respectively.
📝 Abstract
In Augmented Reality (AR), virtual objects interact with real objects. However, the lack of physicality of virtual objects leads to the absence of natural sonic interactions. When virtual and real objects collide, either no sound or a generic sound is played. Both lead to an incongruent multisensory experience, reducing interaction and object realism. Unlike in Virtual Reality (VR) and games, where predefined scenes and interactions allow for the playback of pre-recorded sound samples, AR requires real-time sound synthesis that dynamically adapts to novel contexts and objects to provide audiovisual congruence during interaction. To enhance real-virtual object interactions in AR, we propose a framework for context-aware sounds using methods from computer vision to recognize and segment the materials of real objects. The material's physical properties and the impact dynamics of the interaction are used to generate material-based sounds in real-time using physical modelling synthesis. In a user study with 24 participants, we compared our congruent material-based sounds to a generic sound effect, mirroring the current standard of non-context-aware sounds in AR applications. The results showed that material-based sounds led to significantly more realistic sonic interactions. Material-based sounds also enabled participants to distinguish visually similar materials with significantly greater accuracy and confidence. These findings show that context-aware, material-based sonic interactions in AR foster a stronger sense of realism and enhance our perception of real-world surroundings.