🤖 AI Summary
This work investigates whether state-of-the-art large language and multimodal models possess animal-like spatial cognition—specifically, capabilities in large-scale environmental mapping, fine-grained object shape and layout reasoning, and spatial attention and memory. Method: The authors introduce SPACE, the first cognitive science–driven, dual-modal benchmark for spatial cognition, systematically defining and evaluating model performance across both textual and visual modalities via cross-modal parallel tasks and standardized test sets. Contribution/Results: Empirical evaluation reveals that current leading models perform at near-chance levels on canonical animal spatial cognition tests—e.g., allocentric navigation, object permanence, and spatial working memory—significantly underperforming biological intelligence. The study thus exposes a fundamental deficit in the spatial reasoning abilities of foundation models and establishes a novel, cognitively grounded paradigm for assessing AI spatial intelligence.
📝 Abstract
Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition. Code and data are available: https://github.com/apple/ml-space-benchmark