An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI-driven non-player characters (NPCs) in virtual reality (VR) suffer from insufficient realism and suboptimal interactive performance. Method: This study develops a GPT-4 Turbo–based interrogation simulation system for VR, integrating speech-to-text (STT), text-to-speech (TTS), and end-to-end latency measurement. It introduces the first multidimensional empirical evaluation combining the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Trustworthiness Scale. Contribution/Results: The system achieves a mean interaction latency of 7 seconds, SUS score of 79.44 (above benchmark), and trustworthiness rating of 6.67/10. NPCs demonstrate high behavioral plausibility and intelligent responsiveness, though affective expression remains limited. Findings confirm that large language models (LLMs) significantly enhance VR immersion but expose critical bottlenecks in real-time responsiveness, affective modeling, and trustworthy interaction—providing empirical evidence and methodological guidance for optimizing AI-NPCs in high-fidelity VR applications.

Technology Category

Application Category

📝 Abstract
Advancements in artificial intelligence (AI) have significantly enhanced the realism and interactivity of non-player characters (NPCs) in virtual reality (VR), creating more engaging and believable user experiences. This paper evaluates AI-driven NPCs within a VR interrogation simulator, focusing on their perceived realism, usability, and system performance. The simulator features two AI-powered NPCs, a suspect, and a partner, using GPT-4 Turbo to engage participants in a scenario to determine the suspect's guilt or innocence. A user study with 18 participants assessed the system using the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Believability Questionnaire, alongside latency measurements for speech-to-text (STT), text-to-speech (TTS), OpenAI GPT-4 Turbo, and overall (cycle) latency. Results showed an average cycle latency of 7 seconds, influenced by the increasing conversational context. Believability scored 6.67 out of 10, with high ratings in behavior, social relationships, and intelligence but moderate scores in emotion and personality. The system achieved a SUS score of 79.44, indicating good usability. These findings demonstrate the potential of large language models to improve NPC realism and interaction in VR while highlighting challenges in reducing system latency and enhancing emotional depth. This research contributes to the development of more sophisticated AI-driven NPCs, revealing the need for performance optimization to achieve increasingly immersive virtual experiences.
Problem

Research questions and friction points this paper is trying to address.

Evaluating AI-driven NPCs' realism and performance in VR
Assessing usability and believability of AI-powered virtual characters
Investigating system latency and emotional depth in VR interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4 Turbo for NPC dialogue interaction
Multi-metric evaluation including SUS and GEQ
Real-time latency optimization for VR
🔎 Similar Papers
No similar papers found.