Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that foundation models face in constructing, updating, and maintaining coherent spatial beliefs in partially observable environments. We propose a “Theory of Space” framework that systematically evaluates agents’ ability to build cognitive maps and form revisable spatial beliefs from sequences of local observations through curiosity-driven active exploration tasks. By introducing a spatial belief probing mechanism—integrating cognitive mapping benchmarks, false-belief paradigms, and comparisons with procedural agents—we uncover critical limitations in current models, including a performance gap between active and passive settings, inefficient exploration, unstable beliefs, and belief inertia. Our experiments demonstrate that foundation models suffer significant performance degradation during active exploration and that vision-based models exhibit greater difficulty than text-based models in revising outdated spatial beliefs.

Technology Category

Application Category

📝 Abstract
Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for active, self-directed exploration remains understudied. We propose Theory of Space, defined as an agent's ability to actively acquire information through self-directed, active exploration and to construct, revise, and exploit a spatial belief from sequential, partial observations. We evaluate this through a benchmark where the goal is curiosity-driven exploration to build an accurate cognitive map. A key innovation is spatial belief probing, which prompts models to reveal their internal spatial representations at each step. Our evaluation of state-of-the-art models reveals several critical bottlenecks. First, we identify an Active-Passive Gap, where performance drops significantly when agents must autonomously gather information. Second, we find high inefficiency, as models explore unsystematically compared to program-based proxies. Through belief probing, we diagnose that while perception is an initial bottleneck, global beliefs suffer from instability that causes spatial knowledge to degrade over time. Finally, using a false belief paradigm, we uncover Belief Inertia, where agents fail to update obsolete priors with new evidence. This issue is present in text-based agents but is particularly severe in vision-based models. Our findings suggest that current foundation models struggle to maintain coherent, revisable spatial beliefs during active exploration.
Problem

Research questions and friction points this paper is trying to address.

spatial belief
active exploration
foundation models
partial observability
belief updating
Innovation

Methods, ideas, or system contributions that make the work stand out.

Theory of Space
spatial belief probing
active exploration
belief inertia
cognitive map
🔎 Similar Papers
No similar papers found.