SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of disentangling spatial reasoning from linguistic statistical heuristics in large language models (LLMs), a limitation unaddressed by existing evaluations that often conflate spatial cognition with visual perception in multimodal settings. To isolate and assess intrinsic spatial representational capabilities, the authors propose SpatialText, the first purely textual cognitive diagnostic framework. It integrates human-annotated descriptions of real 3D environments with procedurally generated logical scenarios, grounded in cognitive science principles, to evaluate spatial understanding without visual confounds. The study reveals that while mainstream LLMs can retrieve global spatial facts, they exhibit systematic deficiencies in egocentric perspective-taking and local reference frame reasoning, indicating a reliance on lexical co-occurrence patterns rather than constructing coherent, embodied spatial mental models.

Technology Category

Application Category

📝 Abstract
Genuine spatial reasoning relies on the capacity to construct and manipulate coherent internal spatial representations, often conceptualized as mental models, rather than merely processing surface linguistic associations. While large language models exhibit advanced capabilities across various domains, existing benchmarks fail to isolate this intrinsic spatial cognition from statistical language heuristics. Furthermore, multimodal evaluations frequently conflate genuine spatial reasoning with visual perception. To systematically investigate whether models construct flexible spatial mental models, we introduce SpatialText, a theory-driven diagnostic framework. Rather than functioning simply as a dataset, SpatialText isolates text-based spatial reasoning through a dual-source methodology. It integrates human-annotated descriptions of real 3D indoor environments, which capture natural ambiguities, perspective shifts, and functional relations, with code-generated, logically precise scenes designed to probe formal spatial deduction and epistemic boundaries. Systematic evaluation across state-of-the-art models reveals fundamental representational limitations. Although models demonstrate proficiency in retrieving explicit spatial facts and operating within global, allocentric coordinate systems, they exhibit critical failures in egocentric perspective transformation and local reference frame reasoning. These systematic errors provide strong evidence that current models rely heavily on linguistic co-occurrence heuristics rather than constructing coherent, verifiable internal spatial representations. SpatialText thus serves as a rigorous instrument for diagnosing the cognitive boundaries of artificial spatial intelligence.
Problem

Research questions and friction points this paper is trying to address.

spatial reasoning
mental models
large language models
cognitive benchmark
spatial representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial reasoning
mental models
large language models
cognitive benchmark
egocentric perspective
🔎 Similar Papers
No similar papers found.
P
Peiyao Jiang
Zhejiang University
Zequn Qin
Zequn Qin
Zhejiang University
computer visiondeep learningmachine learning
X
Xi Li
Zhejiang University