๐ค AI Summary
This work addresses the novel task of spatial transition point (STP) detection and main spatial transition point (MSTP) identification from single-frame 3D game images. We propose the first end-to-end two-stage deep learning framework: Stage I employs a Faster R-CNNโbased detector to localize traversable STPs; Stage II introduces a lightweight ranking network that fuses local and global visual features, augmented with parameter-efficient adapters and retrieval-enhanced mechanisms, to precisely identify the unique MSTP leading to the playerโs current macro-goal. Evaluated on five action RPG game datasets, our method significantly improves robustness in low-resource settings for MSTP selection. We establish the first benchmark for this task and introduce a new paradigm for intelligent map construction and navigation assistance in 3D game environments.
๐ Abstract
In complex 3D game environments, players rely on visual affordances to spot map transition points. Efficient identification of such points is important to client-side auto-mapping, and provides an objective basis for evaluating map cue presentation. In this work, we formalize the task of detecting traversable Spatial Transition Points (STPs)-connectors between two sub regions-and selecting the singular Main STP (MSTP), the unique STP that lies on the designer-intended critical path toward the player's current macro-objective, from a single game frame, proposing this as a new research focus. We introduce a two-stage deep-learning pipeline that first detects potential STPs using Faster R-CNN and then ranks them with a lightweight MSTP selector that fuses local and global visual features. Both stages benefit from parameter-efficient adapters, and we further introduce an optional retrieval-augmented fusion step. Our primary goal is to establish the feasibility of this problem and set baseline performance metrics. We validate our approach on a custom-built, diverse dataset collected from five Action RPG titles. Our experiments reveal a key trade-off: while full-network fine-tuning produces superior STP detection with sufficient data, adapter-only transfer is significantly more robust and effective in low-data scenarios and for the MSTP selection task. By defining this novel problem, providing a baseline pipeline and dataset, and offering initial insights into efficient model adaptation, we aim to contribute to future AI-driven navigation aids and data-informed level-design tools.