EscherVerse: An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical gap in spatial reasoning research, which often neglects the influence of human intention on dynamic spatial configurations and lacks a unified evaluation framework integrating physical laws with goal-directed behavior. To bridge this gap, we propose Teleo-Spatial Intelligence (TSI), a novel paradigm that systematically incorporates intention-driven spatial reasoning. We introduce EscherVerse, an open-world benchmark comprising the large-scale real-world video dataset Escher-35k and the evaluation suite Escher-Bench, enabling joint assessment of object permanence, state transitions, and trajectory prediction. This benchmark advances spatial intelligence from passive perception toward purpose-oriented, holistic understanding. Furthermore, we develop the Escher series of models that jointly learn physical interactions and intention inference, providing embodied agents with foundational capabilities grounded in both physical commonsense and goal comprehension.

Technology Category

Application Category

📝 Abstract
The ability to reason about spatial dynamics is a cornerstone of intelligence, yet current research overlooks the human intent behind spatial changes. To address these limitations, we introduce Teleo-Spatial Intelligence (TSI), a new paradigm that unifies two critical pillars: Physical-Dynamic Reasoning--understanding the physical principles of object interactions--and Intent-Driven Reasoning--inferring the human goals behind these actions. To catalyze research in TSI, we present EscherVerse, consisting of a large-scale, open-world benchmark (Escher-Bench), a dataset (Escher-35k), and models (Escher series). Derived from real-world videos, EscherVerse moves beyond constrained settings to explicitly evaluate an agent's ability to reason about object permanence, state transitions, and trajectory prediction in dynamic, human-centric scenarios. Crucially, it is the first benchmark to systematically assess Intent-Driven Reasoning, challenging models to connect physical events to their underlying human purposes. Our work, including a novel data curation pipeline, provides a foundational resource to advance spatial intelligence from passive scene description toward a holistic, purpose-driven understanding of the world.
Problem

Research questions and friction points this paper is trying to address.

Teleo-Spatial Intelligence
Physical-Dynamic Reasoning
Intent-Driven Reasoning
spatial dynamics
human intent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Teleo-Spatial Intelligence
Intent-Driven Reasoning
Physical-Dynamic Reasoning
Open-World Benchmark
Human-Centric Spatial Understanding
🔎 Similar Papers
No similar papers found.