Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Large language models exhibit limited performance on spatial reasoning tasks, hindering their application in embodied intelligence and related domains. To address this, this work proposes MCTS-Guided Group Relative Policy Optimization (M-GRPO), a novel approach that decomposes complex spatial tasks into manageable subtasks through hierarchical task decomposition. The method reformulates the UCT formula to integrate the model’s prior prediction probabilities with cognitive uncertainty and introduces a fine-grained advantage function to refine the decision-making process. Evaluated on navigation, planning, and strategic gaming benchmarks, M-GRPO substantially enhances the spatial reasoning capabilities of large language models, achieving state-of-the-art performance across these domains.

📝 Abstract

LLMs have shown remarkable proficiency in general language understanding and reasoning. However, they consistently underperform in spatial reasoning that severely limits their application, particularly in embodied intelligence. Inspired by the success of hierarchical reinforcement learning, this paper introduces a novel method for hierarchical task decomposition in LLM spatial reasoning. Our approach guides LLMs to decompose complex tasks into manageable sub-tasks by identifying key intermediate states and generating simplified sub-environments. However, we identify that LLMs often fail to derive optimal intermediate states due to their insufficient spatial prior, leading to sub-optimal task decomposition. To address this limitation and enhance its planning capability, we propose the MCTS-Guided Group Relative Policy Optimization (M-GRPO), where we reformulate the UCT formula by incorporating the LLM's prior predictive probabilities alongside its epistemic uncertainty. Furthermore, we implement a more fine-grained advantage function, enabling the model to learn optimal path planning. Experimental results demonstrate that our method substantially improves LLM performance on spatial tasks, including navigation, planning, and strategic games, achieving state-of-the-art results. This work paves the way for LLMs in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

spatial reasoning

large language models

embodied intelligence

task decomposition

navigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Decomposition

Spatial Reasoning

MCTS-Guided Optimization