🤖 AI Summary
This work investigates whether large language models (LLMs) can spontaneously construct a consistent global spatial representation—encompassing spatial perception (i.e., inferring global layouts) and spatial navigation (i.e., learning road networks and planning paths)—solely from local, relative, coordinate-free spatial descriptions (e.g., “A is east of B”). We propose three methodological components: zero-shot/few-shot spatial relation reasoning, trajectory-text-driven road network modeling, and latent geometric alignment analysis. Our study provides the first systematic empirical validation that LLMs can emergently acquire global spatial representations from unlabeled, fragmented spatial utterances: they generalize relationships among unseen points of interest (POIs) in simulated cities, their latent embeddings exhibit statistically significant alignment with ground-truth geographic distributions (p < 0.01), and they support high-accuracy end-to-end path planning. These findings reveal LLMs’ implicit capacity to model real-world spatial structure, establishing a novel paradigm for embodied AI and geographic reasoning.
📝 Abstract
Recent advances in Large Language Models (LLMs) have demonstrated strong capabilities in tasks such as code and mathematics. However, their potential to internalize structured spatial knowledge remains underexplored. This study investigates whether LLMs, grounded in locally relative human observations, can construct coherent global spatial cognition by integrating fragmented relational descriptions. We focus on two core aspects of spatial cognition: spatial perception, where models infer consistent global layouts from local positional relationships, and spatial navigation, where models learn road connectivity from trajectory data and plan optimal paths between unconnected locations. Experiments conducted in a simulated urban environment demonstrate that LLMs not only generalize to unseen spatial relationships between points of interest (POIs) but also exhibit latent representations aligned with real-world spatial distributions. Furthermore, LLMs can learn road connectivity from trajectory descriptions, enabling accurate path planning and dynamic spatial awareness during navigation.