Does Spatial Cognition Emerge in Frontier Models?

📅 2024-10-09
🏛️ arXiv.org
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether state-of-the-art large language and multimodal models possess animal-like spatial cognition—specifically, capabilities in large-scale environmental mapping, fine-grained object shape and layout reasoning, and spatial attention and memory. Method: The authors introduce SPACE, the first cognitive science–driven, dual-modal benchmark for spatial cognition, systematically defining and evaluating model performance across both textual and visual modalities via cross-modal parallel tasks and standardized test sets. Contribution/Results: Empirical evaluation reveals that current leading models perform at near-chance levels on canonical animal spatial cognition tests—e.g., allocentric navigation, object permanence, and spatial working memory—significantly underperforming biological intelligence. The study thus exposes a fundamental deficit in the spatial reasoning abilities of foundation models and establishes a novel, cognitively grounded paradigm for assessing AI spatial intelligence.

Technology Category

Application Category

📝 Abstract
Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition. Code and data are available: https://github.com/apple/ml-space-benchmark
Problem

Research questions and friction points this paper is trying to address.

Evaluates spatial cognition in frontier models systematically
Assesses large-scale mapping and small-scale spatial reasoning abilities
Compares model performance to animal spatial intelligence benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

SPACE benchmark evaluates spatial cognition systematically
Tests large-scale mapping and small-scale object reasoning
Assesses models via text and image parallel presentations
🔎 Similar Papers
No similar papers found.
S
Santhosh K. Ramakrishnan
Apple
Erik Wijmans
Erik Wijmans
Apple
Computer VisionMachine LearningEmbodied AI
P
Philipp Kraehenbuehl
Apple
V
V. Koltun
Apple