Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the limitations of large language models in complex spatial reasoning—specifically, their susceptibility to hallucination, which undermines both accuracy and interpretability. To mitigate this, the authors propose Computational Grounded Reasoning (CGR), a novel paradigm that first employs a structured spatial scene graph engine to perform deterministic computations (e.g., distance and safety verification) before generating answers via a three-tier routing mechanism leveraging both OpenAI and Anthropic models. CGR further integrates several innovative components, including entropy-guided action selection, policy-aware code generation, score-driven iterative refinement, and prompt leakage auditing. Evaluated on the FieldWorkArena and MLE-Bench benchmarks, CGR substantially improves reasoning accuracy while ensuring full interpretability through structured intermediate representations and verifiable deterministic computations.

Technology Category

Application Category

📝 Abstract

We introduce compute-grounded reasoning (CGR), a design paradigm for spatial-aware research agents in which every answerable sub-problem is resolved by deterministic computation before a language model is asked to generate. Spatial Atlas instantiates CGR as a single Agent-to-Agent (A2A) server that handles two challenging benchmarks: FieldWorkArena, a multimodal spatial question-answering benchmark spanning factory, warehouse, and retail environments, and MLE-Bench, a suite of 75 Kaggle machine learning competitions requiring end-to-end ML engineering. A structured spatial scene graph engine extracts entities and relations from vision descriptions, computes distances and safety violations deterministically, then feeds computed facts to large language models, thereby avoiding hallucinated spatial reasoning. Entropy-guided action selection maximizes information gain per step and routes queries across a three-tier frontier model stack (OpenAI + Anthropic). A self-healing ML pipeline with strategy-aware code generation, a score-driven iterative refinement loop, and a prompt-based leak audit registry round out the system. We evaluate across both benchmarks and show that CGR yields competitive accuracy while maintaining interpretability through structured intermediate representations and deterministic spatial computations.

Problem

Research questions and friction points this paper is trying to address.

spatial reasoning

hallucination

deterministic computation

research agent

benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

compute-grounded reasoning

spatial scene graph

deterministic spatial computation