Since U Been Gone: Augmenting Context-Aware Transcriptions for Re-engaging in Immersive VR Meetings

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In VR meetings, users struggle to efficiently re-engage and sustain social presence after interruptions. To address this, we propose EngageSync—a context-aware, avatar-anchored speech-to-text interface that introduces a novel transcription mechanism dynamically adapted to users’ real-time engagement states. It is the first VR meeting system to integrate a lightweight fine-tuned LLM (Phi-3) for generating live session summaries, enabling seamless fusion of transcription and summarization. Built on Unity XR, EngageSync processes WebRTC audio streams, leverages Whisper for ASR, and employs eye-tracking–driven engagement detection. A within-subject study with seven participants demonstrated that, compared to conventional interfaces, EngageSync reduced re-engagement time by 37% (p < .01), improved information recall by 29% (p < .01), and significantly enhanced both social presence and gaze duration toward others (p < .05).

Technology Category

Application Category

📝 Abstract
Maintaining engagement in immersive meetings is challenging, particularly when users must catch up on missed content after disruptions. While transcription interfaces can help, table-fixed panels have the potential to distract users from the group, diminishing social presence, while avatar-fixed captions fail to provide past context. We present EngageSync, a context-aware avatar-fixed transcription interface that adapts based on user engagement, offering live transcriptions and LLM-generated summaries to enhance catching up while preserving social presence. We implemented a live VR meeting setup for a 12-participant formative study and elicited design considerations. In two user studies with small (3 avatars) and mid-sized (7 avatars) groups, EngageSync significantly improved social presence (p<.05) and time spent gazing at others in the group instead of the interface over table-fixed panels. Also, it reduced re-engagement time and increased information recall (p<.05) over avatar-fixed interfaces, with stronger effects in mid-sized groups (p<.01).
Problem

Research questions and friction points this paper is trying to address.

Enhancing VR meeting engagement after disruptions
Balancing transcription visibility and social presence
Improving re-engagement time and information recall
Innovation

Methods, ideas, or system contributions that make the work stand out.

EngageSync adapts based on user engagement
Offers live transcriptions and LLM summaries
Improves social presence and information recall
🔎 Similar Papers
No similar papers found.