🤖 AI Summary
This work addresses the challenge of cascading errors in large language models when performing structured spatial navigation within complex topologies. To mitigate this issue, the authors propose STAR, a two-stage framework: the first stage employs supervised fine-tuning to internalize spatial semantics and prune redundant paths, while the second introduces segment-level Direct Preference Optimization with spatial awareness (SDPO) to enable self-correction during long-horizon navigation. The study innovatively constructs RedMaze-23K, a dataset annotated with human-inspired turning points, and for the first time integrates turning-point alignment with segment-level DPO to enhance spatial reasoning. Experimental results demonstrate that STAR-32B achieves state-of-the-art performance among open-source models with an accuracy of 29.27%, surpassing DeepSeek-V3 and attaining 82.4% of GPT-4’s performance.
📝 Abstract
Structured spatial navigation is a core benchmark for Large Language Models (LLMs) spatial reasoning. Existing paradigms like Visualization-of-Thought (VoT) are prone to cascading errors in complex topologies. To solve this, we propose STAR, a two-stage framework grounded on topological anchors, and introduce the RedMaze-23K dataset with human-inspired turnpoint annotations. The first stage uses supervised fine-tuning to help models internalize spatial semantics and prune redundant paths. The second adopts Spatial-aware Segment-level Direct Preference Optimization (SDPO) to refine self-correction in long-horizon navigation. Experiments show STAR achieves state-of-the-art performance among open-source models: its 32B variant outperforms DeepSeek-V3 (29.27% vs. 25.00%) and reaches 82.4% of GPT-4's performance.