🤖 AI Summary
This study addresses the challenge of enhancing the embodiment and presence of conversational agents in vision-deprived interaction contexts—such as when users wear headphones or screenless glasses—where visual cues are unavailable. It presents the first systematic integration of spatialized speech with contextually relevant Foley audio cues to investigate their combined impact on users’ sense of co-presence and social perception through a within-subjects, two-factor experiment. Results demonstrate that this auditory embodiment approach significantly strengthens users’ feeling of co-presence with the agent. However, it unexpectedly diminishes perceived social attributes such as the agent’s attentiveness, revealing novel trade-offs and design considerations for crafting agent presence under purely auditory conditions.
📝 Abstract
Embodiment can enhance conversational agents, such as increasing their perceived presence. This is typically achieved through visual representations of a virtual body; however, visual modalities are not always available, such as when users interact with agents using headphones or display-less glasses. In this work, we explore auditory embodiment. By introducing auditory cues of bodily presence - through spatially localized voice and situated Foley audio from environmental interactions - we investigate how audio alone can convey embodiment and influence perceptions of a conversational agent. We conducted a 2 (spatialization: monaural vs. spatialized) x 2 (Foley: none vs. Foley) within-subjects study, where participants (n=24) engaged in conversations with agents. Our results show that spatialization and Foley increase co-presence, but reduce users'perceptions of the agent's attention and other social attributes.