Geometrically-Constrained Agent for Spatial Reasoning

๐Ÿ“… 2025-11-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Vision-language models (VLMs) suffer from a fundamental semantic-geometric misalignment in spatial reasoning, leading to unverifiable inference and uncontrolled planning. To address this, we propose the Geometry-Constrained Agent (GCA) paradigmโ€”a training-free framework that explicitly enforces formal task constraints throughout the entire reasoning process. GCA strictly decouples semantic parsing (performed by the VLM) from geometric solving (executed by deterministic, domain-specific tools), thereby eliminating reliance on unrealistic โ€œoracleโ€ assumptions prevalent in prior work. It establishes a verifiable constraint framework that guarantees end-to-end reasoning within rigorous geometric bounds. Evaluated across multiple spatial reasoning benchmarks, GCA achieves state-of-the-art performance, delivering an average 27% improvement over prior methods while significantly enhancing accuracy, robustness, and formal verifiability.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision Language Models (VLMs) exhibit a fundamental semantic-to-geometric gap in spatial reasoning: they excel at qualitative semantic inference but their reasoning operates within a lossy semantic space, misaligned with high-fidelity geometry. Current paradigms fail to bridge this gap. Training-based methods suffer from an ``oracle paradox,'' learning flawed spatial logic from imperfect oracles. Tool-integrated methods constrain the final computation but critically leave the VLM's planning process unconstrained, resulting in geometrically flawed plans. In this work, we propose Geometrically-Constrained Agent (GCA), a training-free agentic paradigm that resolves this gap by introducing a formal task constraint. Specifically, we strategically decouples the VLM's role into two stages. First, acting as a semantic analyst, the VLM translates the user's ambiguous query into the formal, verifiable task constraint, which defines the reference frame and objective. Second, acting as a task solver, the VLM generates and executes tool calls strictly within the deterministic bounds defined by the constraint. This geometrically-constrained reasoning strategy successfully resolve the semantic-to-geometric gap, yielding a robust and verifiable reasoning pathway for spatial reasoning. Comprehensive experiments demonstrate that GCA achieves SOTA performance on multiple spatial reasoning benchmarks, surpassing existing training-based and tool-integrated methods by ~27%. Please see our homepage at https://gca-spatial-reasoning.github.io.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic-geometric gap in VLMs' spatial reasoning
Resolving flawed planning in tool-integrated spatial methods
Overcoming oracle paradox in training-based spatial approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples VLM into semantic analyst and task solver
Introduces formal task constraint for verifiable spatial reasoning
Generates tool calls within deterministic geometric bounds
๐Ÿ”Ž Similar Papers
No similar papers found.