SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses the challenge of error accumulation and limited performance gains in 3D spatial reasoning caused by reliance on model consensus for pseudo-label generation. To overcome this, the authors propose SpatialEvo, a novel framework that leverages the deterministic nature of 3D geometry to construct a Deterministic Geometric Environment (DGE), effectively transforming unlabeled scenes into a zero-noise interactive oracle. SpatialEvo employs a shared-parameter dual-role policy network to co-evolve problem generation and solving, augmented by a task-adaptive curriculum scheduler that dynamically prioritizes weaker tasks. Without any human annotations, the framework enables unified self-evolutionary training across 16 diverse spatial reasoning tasks, significantly outperforming existing methods on nine benchmarks and achieving state-of-the-art average scores at both 3B and 7B model scales, while preserving general visual understanding capabilities.

Technology Category

Application Category

📝 Abstract
Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by the cost of geometric annotation. The self-evolving paradigm offers a promising path, but its reliance on model consensus to construct pseudo-labels causes training to reinforce rather than correct the model's own geometric errors. We identify a property unique to 3D spatial reasoning that circumvents this limitation: ground truth is a deterministic consequence of the underlying geometry, computable exactly from point clouds and camera poses without any model involvement. Building on this insight, we present SpatialEvo, a self-evolving framework for 3D spatial reasoning, centered on the Deterministic Geometric Environment (DGE). The DGE formalizes 16 spatial reasoning task categories under explicit geometric validation rules and converts unannotated 3D scenes into zero-noise interactive oracles, replacing model consensus with objective physical feedback. A single shared-parameter policy co-evolves across questioner and solver roles under DGE constraints: the questioner generates physically valid spatial questions grounded in scene observations, while the solver derives precise answers against DGE-verified ground truth. A task-adaptive scheduler endogenously concentrates training on the model's weakest categories, producing a dynamic curriculum without manual design. Experiments across nine benchmarks demonstrate that SpatialEvo achieves the highest average score at both 3B and 7B scales, with consistent gains on spatial reasoning benchmarks and no degradation on general visual understanding.
Problem

Research questions and friction points this paper is trying to address.

3D spatial reasoning
self-evolving
geometric annotation
pseudo-labels
model bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving
Deterministic Geometric Environment
3D spatial reasoning
Geometric validation
Dynamic curriculum learning