Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the challenge that foundation models face in constructing, updating, and maintaining coherent spatial beliefs in partially observable environments. We propose a “Theory of Space” framework that systematically evaluates agents’ ability to build cognitive maps and form revisable spatial beliefs from sequences of local observations through curiosity-driven active exploration tasks. By introducing a spatial belief probing mechanism—integrating cognitive mapping benchmarks, false-belief paradigms, and comparisons with procedural agents—we uncover critical limitations in current models, including a performance gap between active and passive settings, inefficient exploration, unstable beliefs, and belief inertia. Our experiments demonstrate that foundation models suffer significant performance degradation during active exploration and that vision-based models exhibit greater difficulty than text-based models in revising outdated spatial beliefs.

Technology Category

Application Category

📝 Abstract

Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for active, self-directed exploration remains understudied. We propose Theory of Space, defined as an agent's ability to actively acquire information through self-directed, active exploration and to construct, revise, and exploit a spatial belief from sequential, partial observations. We evaluate this through a benchmark where the goal is curiosity-driven exploration to build an accurate cognitive map. A key innovation is spatial belief probing, which prompts models to reveal their internal spatial representations at each step. Our evaluation of state-of-the-art models reveals several critical bottlenecks. First, we identify an Active-Passive Gap, where performance drops significantly when agents must autonomously gather information. Second, we find high inefficiency, as models explore unsystematically compared to program-based proxies. Through belief probing, we diagnose that while perception is an initial bottleneck, global beliefs suffer from instability that causes spatial knowledge to degrade over time. Finally, using a false belief paradigm, we uncover Belief Inertia, where agents fail to update obsolete priors with new evidence. This issue is present in text-based agents but is particularly severe in vision-based models. Our findings suggest that current foundation models struggle to maintain coherent, revisable spatial beliefs during active exploration.

Problem

Research questions and friction points this paper is trying to address.

spatial belief

active exploration

foundation models

partial observability

belief updating

Innovation

Methods, ideas, or system contributions that make the work stand out.

Theory of Space

spatial belief probing

active exploration