🤖 AI Summary
Current AI-driven non-player characters (NPCs) in virtual reality (VR) suffer from insufficient realism and suboptimal interactive performance. Method: This study develops a GPT-4 Turbo–based interrogation simulation system for VR, integrating speech-to-text (STT), text-to-speech (TTS), and end-to-end latency measurement. It introduces the first multidimensional empirical evaluation combining the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Trustworthiness Scale. Contribution/Results: The system achieves a mean interaction latency of 7 seconds, SUS score of 79.44 (above benchmark), and trustworthiness rating of 6.67/10. NPCs demonstrate high behavioral plausibility and intelligent responsiveness, though affective expression remains limited. Findings confirm that large language models (LLMs) significantly enhance VR immersion but expose critical bottlenecks in real-time responsiveness, affective modeling, and trustworthy interaction—providing empirical evidence and methodological guidance for optimizing AI-NPCs in high-fidelity VR applications.
📝 Abstract
Advancements in artificial intelligence (AI) have significantly enhanced the realism and interactivity of non-player characters (NPCs) in virtual reality (VR), creating more engaging and believable user experiences. This paper evaluates AI-driven NPCs within a VR interrogation simulator, focusing on their perceived realism, usability, and system performance. The simulator features two AI-powered NPCs, a suspect, and a partner, using GPT-4 Turbo to engage participants in a scenario to determine the suspect's guilt or innocence. A user study with 18 participants assessed the system using the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Believability Questionnaire, alongside latency measurements for speech-to-text (STT), text-to-speech (TTS), OpenAI GPT-4 Turbo, and overall (cycle) latency. Results showed an average cycle latency of 7 seconds, influenced by the increasing conversational context. Believability scored 6.67 out of 10, with high ratings in behavior, social relationships, and intelligence but moderate scores in emotion and personality. The system achieved a SUS score of 79.44, indicating good usability. These findings demonstrate the potential of large language models to improve NPC realism and interaction in VR while highlighting challenges in reducing system latency and enhancing emotional depth. This research contributes to the development of more sophisticated AI-driven NPCs, revealing the need for performance optimization to achieve increasingly immersive virtual experiences.