π€ AI Summary
Traditional non-player characters (NPCs) in games often rely on scripted dialogues and lack spatial awareness, limiting their ability to respond dynamically to player actions and diminishing immersion. This work proposes a novel approach that integrates large language models (LLMs) with computer vision to endow NPCs with real-time environmental perception. By performing semantic segmentation on panoramic images, the method generates a structured JSON representation of the scene, enriched with scene graphs and directional vector encodings to provide LLMs with interpretable spatial context. This enables NPCs to produce contextually grounded dialogue that reflects their surroundings. To our knowledge, this is the first framework to jointly incorporate panoramic vision, semantic segmentation, and directional cues as inputs to an LLM for interactive agent perception. User studies demonstrate that players significantly prefer NPCs powered by this approach over baseline agents without environmental awareness, confirming its effectiveness in enhancing interaction naturalness and immersion.
π Abstract
We present an approach for enhancing non-playable characters (NPCs) in games by combining large language models (LLMs) with computer vision to provide contextual awareness of their surroundings. Conventional NPCs typically rely on pre-scripted dialogue and lack spatial understanding, which limits their responsiveness to player actions and reduces overall immersion. Our method addresses these limitations by capturing panoramic images of an NPC's environment and applying semantic segmentation to identify objects and their spatial positions. The extracted information is used to generate a structured JSON representation of the environment, combining object locations derived from segmentation with additional scene graph data within the NPC's bounding sphere, encoded as directional vectors. This representation is provided as input to the LLM, enabling NPCs to incorporate spatial knowledge into player interactions. As a result, NPCs can dynamically reference nearby objects, landmarks, and environmental features, leading to more believable and engaging gameplay. We describe the technical implementation of the system and evaluate it in two stages. First, an expert interview was conducted to gather feedback and identify areas for improvement. After integrating these refinements, a user study was performed, showing that participants preferred the context-aware NPCs over a non-context-aware baseline, confirming the effectiveness of the proposed approach.