Since U Been Gone: Augmenting Context-Aware Transcriptions for Re-engaging in Immersive VR Meetings

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

In VR meetings, users struggle to efficiently re-engage and sustain social presence after interruptions. To address this, we propose EngageSync—a context-aware, avatar-anchored speech-to-text interface that introduces a novel transcription mechanism dynamically adapted to users’ real-time engagement states. It is the first VR meeting system to integrate a lightweight fine-tuned LLM (Phi-3) for generating live session summaries, enabling seamless fusion of transcription and summarization. Built on Unity XR, EngageSync processes WebRTC audio streams, leverages Whisper for ASR, and employs eye-tracking–driven engagement detection. A within-subject study with seven participants demonstrated that, compared to conventional interfaces, EngageSync reduced re-engagement time by 37% (p < .01), improved information recall by 29% (p < .01), and significantly enhanced both social presence and gaze duration toward others (p < .05).

Technology Category

Application Category

📝 Abstract

Maintaining engagement in immersive meetings is challenging, particularly when users must catch up on missed content after disruptions. While transcription interfaces can help, table-fixed panels have the potential to distract users from the group, diminishing social presence, while avatar-fixed captions fail to provide past context. We present EngageSync, a context-aware avatar-fixed transcription interface that adapts based on user engagement, offering live transcriptions and LLM-generated summaries to enhance catching up while preserving social presence. We implemented a live VR meeting setup for a 12-participant formative study and elicited design considerations. In two user studies with small (3 avatars) and mid-sized (7 avatars) groups, EngageSync significantly improved social presence (p<.05) and time spent gazing at others in the group instead of the interface over table-fixed panels. Also, it reduced re-engagement time and increased information recall (p<.05) over avatar-fixed interfaces, with stronger effects in mid-sized groups (p<.01).

Problem

Research questions and friction points this paper is trying to address.

Enhancing VR meeting engagement after disruptions

Balancing transcription visibility and social presence

Improving re-engagement time and information recall

Innovation

Methods, ideas, or system contributions that make the work stand out.

EngageSync adapts based on user engagement

Offers live transcriptions and LLM summaries

Improves social presence and information recall

🔎 Similar Papers

EmBARDiment: an Embodied AI Agent for Productivity in XR

2024-08-15arXiv.orgCitations: 0

Emotive Speech-to-Text Interfaces in XR: A Narrative Review of Psychophysiological and Accessibility Advances

2024-05-22Citations: 1

ByteDance

圣何塞

Research Scientist Intern, Multimodal AI (PhD)