Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study

📅 2024-12-30

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the limited naturalness and responsiveness of avatar-mediated dialogue in VR. We propose a locally deployed large language model (LLM)-driven real-time multi-avatar dialogue system, integrating automatic speech recognition (ASR), text-to-speech (TTS), and lip-sync rendering. Our method introduces an LLM-guided finite-state machine for avatar behavior control and a retrieval-augmented generation (RAG)-enhanced context-aware response mechanism. Crucially, we conduct the first systematic evaluation in VR of avatar state indicators—namely, thinking, listening, and generating states—on perceived response latency and user immersion. A pilot user study demonstrates that these three states significantly improve realism and response predictability in task-oriented dialogues. The work contributes a reusable system architecture for VR-based task-oriented conversational AI and empirically grounded design guidelines for avatar state signaling.

Technology Category

Application Category

📝 Abstract

We present a virtual reality (VR) environment featuring conversational avatars powered by a locally-deployed LLM, integrated with automatic speech recognition (ASR), text-to-speech (TTS), and lip-syncing. Through a pilot study, we explored the effects of three types of avatar status indicators during response generation. Our findings reveal design considerations for improving responsiveness and realism in LLM-driven conversational systems. We also detail two system architectures: one using an LLM-based state machine to control avatar behavior and another integrating retrieval-augmented generation (RAG) for context-grounded responses. Together, these contributions offer practical insights to guide future work in developing task-oriented conversational AI in VR environments.

Problem

Research questions and friction points this paper is trying to address.

Virtual Reality

Dialogue Interaction

Chatbot Design

Innovation

Methods, ideas, or system contributions that make the work stand out.

VR Environment

Advanced Local Language Model

Optimized Dialogue Naturalness

🔎 Similar Papers

EmBARDiment: an Embodied AI Agent for Productivity in XR