Does Spatial Cognition Emerge in Frontier Models?

📅 2024-10-09

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work investigates whether state-of-the-art large language and multimodal models possess animal-like spatial cognition—specifically, capabilities in large-scale environmental mapping, fine-grained object shape and layout reasoning, and spatial attention and memory. Method: The authors introduce SPACE, the first cognitive science–driven, dual-modal benchmark for spatial cognition, systematically defining and evaluating model performance across both textual and visual modalities via cross-modal parallel tasks and standardized test sets. Contribution/Results: Empirical evaluation reveals that current leading models perform at near-chance levels on canonical animal spatial cognition tests—e.g., allocentric navigation, object permanence, and spatial working memory—significantly underperforming biological intelligence. The study thus exposes a fundamental deficit in the spatial reasoning abilities of foundation models and establishes a novel, cognitively grounded paradigm for assessing AI spatial intelligence.

Technology Category

Application Category

📝 Abstract

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition. Code and data are available: https://github.com/apple/ml-space-benchmark

Problem

Research questions and friction points this paper is trying to address.

Evaluates spatial cognition in frontier models systematically

Assesses large-scale mapping and small-scale spatial reasoning abilities

Compares model performance to animal spatial intelligence benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

SPACE benchmark evaluates spatial cognition systematically

Tests large-scale mapping and small-scale object reasoning

Assesses models via text and image parallel presentations

🔎 Similar Papers

No similar papers found.