Map2Thought: Explicit 3D Spatial Reasoning via Metric Cognitive Maps

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work proposes Metric-CogMap and Cognitive Chain-of-Thought (Cog-CoT) to enhance spatial reasoning in 3D vision-language models by introducing the first interpretable, explicit 3D spatial reasoning mechanism. The approach integrates discrete grid-based and continuous metric space representations, enabling geometric reasoning through vector operations, bounding box distances, and occlusion-aware analysis. Remarkably, the model achieves 59.9% accuracy on VSI-Bench using only 50% of the labeled data—nearly matching the full-data baseline of 60.9%—and outperforms existing state-of-the-art methods by 5.3%, 4.8%, and 4.0% under 10%, 25%, and 50% training data regimes, respectively. These results demonstrate a significant reduction in reliance on supervised annotations while maintaining competitive performance.

Technology Category

Application Category

📝 Abstract

We propose Map2Thought, a framework that enables explicit and interpretable spatial reasoning for 3D VLMs. The framework is grounded in two key components: Metric Cognitive Map (Metric-CogMap) and Cognitive Chain-of-Thought (Cog-CoT). Metric-CogMap provides a unified spatial representation by integrating a discrete grid for relational reasoning with a continuous, metric-scale representation for precise geometric understanding. Building upon the Metric-CogMap, Cog-CoT performs explicit geometric reasoning through deterministic operations, including vector operations, bounding-box distances, and occlusion-aware appearance order cues, producing interpretable inference traces grounded in 3D structure. Experimental results show that Map2Thought enables explainable 3D understanding, achieving 59.9% accuracy using only half the supervision, closely matching the 60.9% baseline trained with the full dataset. It consistently outperforms state-of-the-art methods by 5.3%, 4.8%, and 4.0% under 10%, 25%, and 50% training subsets, respectively, on the VSI-Bench.

Problem

Research questions and friction points this paper is trying to address.

3D spatial reasoning

visual language models

cognitive maps

interpretable reasoning

metric representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Metric Cognitive Map

Cognitive Chain-of-Thought

3D spatial reasoning