Harnessing Large Language Model for Virtual Reality Exploration Testing: A Case Study

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing VR field-of-view (FOV) analysis suffers from low entity recognition accuracy and weak cross-frame spatial understanding. Method: This paper introduces the first systematic investigation of multimodal large language models (MLLMs)—specifically GPT-4o—for automated VR FOV analysis, proposing a structured prompt engineering framework that integrates feature encoding (color, position, shape) with cross-frame entity matching. Contribution/Results: Experiments demonstrate a significant improvement in entity recognition accuracy—from 41.67% to 71.30% (F1 = 0.70)—with feature description accuracy ≥90%, high-precision scene classification, and robust spatial relation modeling. The study identifies critical discriminative dimensions and annotation limitations of LLMs in VR visual understanding, establishing a reproducible methodology and technical pathway for intelligent testing and automatic annotation in immersive environments.

Technology Category

Application Category

📝 Abstract
As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing rapidly. Large Language Models (LLMs), capable of retaining information long-term and analyzing both visual and textual data, are emerging as a potential key to deciphering the complexities of VR's evolving user interfaces. In this paper, we conduct a case study to investigate the capability of using LLMs, particularly GPT-4o, for field of view (FOV) analysis in VR exploration testing. Specifically, we validate that LLMs can identify test entities in FOVs and that prompt engineering can effectively enhance the accuracy of test entity identification from 41.67% to 71.30%. Our study also shows that LLMs can accurately describe identified entities' features with at least a 90% correction rate. We further find out that the core features that effectively represent an entity are color, placement, and shape. Furthermore, the combination of the three features can especially be used to improve the accuracy of determining identical entities in multiple FOVs with the highest F1-score of 0.70. Additionally, our study demonstrates that LLMs are capable of scene recognition and spatial understanding in VR with precisely designed structured prompts. Finally, we find that LLMs fail to label the identified test entities, and we discuss potential solutions as future research directions.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Virtual Reality
Scene Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
VR Game Analysis
Automatic Testing
🔎 Similar Papers
No similar papers found.